دانلود مقاله ISI انگلیسی شماره 28781
ترجمه فارسی عنوان مقاله

کاوش شبکه های بیزی برای الگوهای جالب توجه:فیلتر موتور جالب توجه

عنوان انگلیسی
Interestingness filtering engine: Mining Bayesian networks for interesting patterns
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
28781 2014 9 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Expert Systems with Applications, Volume 36, Issue 3, Part 1, April 2009, Pages 5137–5145

ترجمه کلمات کلیدی
قوانین انجمن - جالب توجه - شبکه های بیزی - داده کاوی -
کلمات کلیدی انگلیسی
Association rules, Interestingness, Bayesian networks, Data mining,
پیش نمایش مقاله
پیش نمایش مقاله  کاوش شبکه های بیزی برای الگوهای جالب توجه:فیلتر موتور جالب توجه

چکیده انگلیسی

In this paper, we present a new measure of interestingness to discover interesting patterns based on the user’s background knowledge, represented by a Bayesian network. The new measure (sensitivity measure) captures the sensitivity of the Bayesian network to the patterns discovered by assessing the uncertainty-increasing potential of a pattern on the beliefs of the Bayesian network. Patterns that attain the highest sensitivity scores are deemed interesting. In our approach, mutual information (from information theory) came in handy as a measure of uncertainty. The Sensitivity of a pattern is computed by summing up the mutual information increases incurred by a pattern when entered as evidence/findings to the Bayesian network. We demonstrate the strength of our approach experimentally using the KSL dataset of Danish 70 year olds as a case study. The results were verified by consulting two doctors (internists).

مقدمه انگلیسی

A major problem faced by all association rule mining algorithms is their production of a large number of rules which incurred a secondary mining problem; namely, mining interesting association rules. The problem is compounded by the fact that ‘common knowledge’ discovered rules are not interesting, but they are usually strong rules with high support and confidence levels – the classical measures in Agrawal, Imielinski, and Swami (1993). The main objective of this paper is to develop an Interestingness Filtering Engine (IFE) that leverages background knowledge, represented by a Bayesian network, to discover interesting patterns in datasets. A pattern is considered interesting if it is unexpected or surprising to the user ( Silberschatz and Tuzhilin, 1995 and Silberschatz and Tuzhilin, 1996). A new interestingness measure is defined to capture the sensitivity of the Bayesian network (beliefs) to the patterns discovered. Patterns that attain the highest sensitivity scores are deemed interesting. For this reason, the new Interestingness Measure is called Sensitivity. The IFE utilizes Bayesian networks in two perspectives: The first views the net as a causality/dependence representation of the joint probability distribution of all attributes involved in the user’s preliminary set of beliefs (background knowledge). The second perspective views the Bayesian network as a probabilistic inference engine that can infer the global effect of a frequent itemset on the belief network with the aid of the Sensitivity measure. The Sensitivity measure should measure the uncertainty-increasing potential of a pattern; that is the extent to which a pattern alters the beliefs of the Bayesian network to more uncertain (unexpected) probabilities, when that pattern is entered as new evidence/finding to the Bayesian network. Mutual Information from information theory came in handy as a measure of uncertainty or unexpectedness. Leveraging the symmetry property inherent in Mutual Information (i.e. I(X;Y) = I(Y;X), where I(X;Y) is the Mutual Information between two random variables X and Y), the Sensitivity (interestingness) of a pattern is computed by summing up the mutual information increases incurred by a pattern when entered as finding(s) to the Bayesian network. Only increases in mutual information are considered because we are in pursuit of patterns that increase the degree of variation (i.e. unexpectedness/uncertainty) in the posterior probabilities (beliefs) of the nodes in the Bayesian network. A case study – the KSL dataset of Danish 70 year olds – was used to analyze and verify the experimental results obtained when applying the IFE and its sensitivity measure, which exhibited a strong capability in discovering interesting (unexpected) patterns that are not ‘common knowledge’ patterns.

نتیجه گیری انگلیسی

A new robust measure of interestingness was presented to discover interesting (unexpected) patterns based on background knowledge represented by a Bayesian network. The new measure was devised to capture the sensitivity of the beliefs in a Bayesian network to the patterns discovered. Patterns that attain the highest sensitivity scores were deemed interesting. The discovered patterns were then used to update the Bayesian network. Netica, a powerful application for Bayesian networks was integrated with our algorithms through its Java API (NeticaJ). The strength of the sensitivity measure in discovering genuinely unexpected patterns relative to the users background knowledge was verified by applying it on the KSL dataset as a case study. Two doctors (internists) were consulted regarding the attribute set {FEV,Kol} that was profiled as the most interesting by the IFE. Both doctors affirmed the possibility of a relation between the forced ejection volume of a person’s lungs and the level of cholesterol in the blood; a low forced ejection volume could affect the oxygen intake, which might in turn affect the efficiency of the metabolism in the body causing higher cholesterol levels. The direction of association/dependence between the attributes of an interesting pattern was an important concern to guide the update process of the Bayesian network. Astonishingly, the sensitivity measure exhibited experimentally, its potential in deciding the direction of dependence/association. As for future work, we intend to work on enhancing the scalability and performance of the IFEMiner algorithm by finding a sensitivity set for each itemset based on the d-separation property inherent in Bayesian networks; so as to minimize the number of times the mutual information computation is conducted. The experimental results obtained have triggered a series of good questions that pose as interesting future research paths: Could the sensitivity measure be used in automating the process of modifying Bayesian networks based on the discovered interesting patterns? Could it be used to evaluate the quality of a Bayesian network’s structure? Which is more intelligent, the sensitivity measure or the entropy-based ‘gain’ measure used in the ID3 algorithm for induction of decision trees (Quinlan, 1990)?