دانلود مقاله ISI انگلیسی شماره 22033
ترجمه فارسی عنوان مقاله

پالایش دانش بر اساس کشف الگوهای بینی نشده در داده کاوی

عنوان انگلیسی
Knowledge refinement based on the discovery of unexpected patterns in data mining
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
22033 2002 13 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Decision Support Systems, Volume 33, Issue 3, July 2002, Pages 309–321

ترجمه کلمات کلیدی
پالایش دانش - الگوهای غیرمنتظره - داده کاوی - قوانین انجمن - قواعد کشف - استراتژی های پالایش - پالایش تکرار شونده
کلمات کلیدی انگلیسی
Knowledge refinement, Unexpected patterns, Data mining, Association rules, Rule discovery, Refinement strategies, Iterative refinement
پیش نمایش مقاله
پیش نمایش مقاله  پالایش دانش بر اساس کشف الگوهای بینی نشده در داده کاوی

چکیده انگلیسی

In prior work, we provided methods that generate unexpected patterns with respect to managerial intuition by eliciting managers' beliefs about the domain and using these beliefs to seed the search for unexpected patterns in data. Unexpected patterns discovered in this manner represent contradictions or “holes” in domain knowledge which need to be resolved. Given a belief and a set of unexpected patterns, the motivation behind knowledge refinement is that the belief can be made stronger by refining the belief based on the discovered patterns. In this paper we address the problem of incorporating the discovered contradictions into the belief system based on a formal logic approach. Specifically, we present a framework for refinement based on a generic knowledge refinement strategy, describe abstract properties of refinement algorithms that can be used to compare specific instantiations and then describe and compare two specific refinement algorithms based on this framework

مقدمه انگلیسی

Over the past few years, research in data mining [7] and [11] has produced several new techniques for pattern discovery in large datasets and has demonstrated several successful applications of data mining. However, a drawback of several data mining techniques is that they do not systematically leverage prior domain knowledge of users. In prior research [12], [13] and [14] we proposed new methods to discover unexpected patterns in data based on prior domain knowledge. As demonstrated in [12], [13] and [14], methods that generate unexpected patterns can discover far fewer and more effective patterns than conventional pattern discovery approaches in data mining such as association rule discovery [3]. In general, for knowledge-driven data mining methods, the concept of knowledge refinement is critical since these methods have to deal with reconciling discovered patterns from data with the initial domain knowledge. In particular, unexpected patterns discovered from data using methods proposed in prior work [12], [13] and [14] represent contradictions to domain knowledge that need to be resolved. Resolving contradictions in this manner can be used as a process of iteratively refining domain knowledge through repeated data mining. To address this problem, in this paper we present knowledge refinement methods that reconcile unexpected patterns generated by methods in [12], [13] and [14] with prior domain knowledge. Specifically, in this paper, we present a generic knowledge refinement strategy, describe abstract properties of refinement algorithms that can be used to compare specific instantiations based on the generic strategy and then present and compare two algorithms that refine prior domain knowledge based on the discovery of unexpected association rules. In a broad sense, the idea of refining knowledge has been addressed before in several different communities. In the Knowledge Discovery and Data Mining (KDD) community, Ref. [18] suggests a feedback process in which the discovered patterns would be used to update the belief system but stops short of describing how this can be done. Further, the context in which Ref. [18] describes the feedback process is different—Ref. [18] presents a data monitoring and discovery triggering framework that is applicable in problems where the data changes over time (new data enters the system and this could “trigger” discovery algorithms). In this paper, the data remains constant while the knowledge discovered from this data gets refined by the feedback process. In our work, we will describe how to refine the knowledge base iteratively. In the area of association rule discovery in data mining, there has been little work done in knowledge refinement based on discovered rules. A main reason for this is that most methods for rule discovery in data mining do not systematically integrate prior domain knowledge into the search for patterns in data. Recently unexpected pattern discovery [12], [13] and [14] provides methods that discover unexpected association rules based on systematic incorporation of domain knowledge. However, these approaches do not describe how to refine knowledge based on the discovered patterns. Prior research in the areas of expert systems [5] and [8], scientific discovery [17], belief revision [1] and [4], classification [6] and [15] and concept learning [9] and [10] address issues related to knowledge refinement. Below we briefly describe the issues in these areas and discuss their relation to our work. The idea of new knowledge discovery, its integration with existing knowledge and iterative repetition of this process can be traced back to the early work on expert systems [5] and [8] and on scientific discovery [17]. However, in this paper we deal with discovering unexpected patterns using association rules and with the specific issues of integrating these types of patterns into the existing system of beliefs. There has also been work done on the problem of belief revision in AI (e.g., [1] and [4]). However, in our approach we start with beliefs that are statistically valid. Hence, given the data and the fact that the beliefs “hold”, no update is necessary. However, we demonstrate that refining beliefs can make domain knowledge better. Further, beliefs and the new evidence are represented as “rules”. The methods proposed in [1] and [4] do not describe how to update rules specifically. This lies outside of the scope of the research issues addressed in [1] and [4] and other related papers in belief revision. In classification, several approaches [6] and [15] address the issue of refinement in the context of predicting a dependent class. Specifically, these approaches are based on iteratively applying steps to classify the misclassified cases better. In general in each iteration, the classification method is biased towards incorrectly classified data. These approaches differ from the methods for refinement proposed in this paper since we do not address the issue of classification but focus on building methods to refine domain knowledge in the form of rules using unexpected (association) rule discovery. In the problem of concept learning [9] and [11], several approaches address rule refinement. In concept learning the task is to learn a set of rules that completely characterize a predicted class (concept). Some approaches deal with starting with domain theories about the concept and identify exceptions in the data. Since the task of discovering all rules that can explain the exceptions is exponential, these approaches use specific heuristics in rule refinement. Our work is the first approach to use association rules, in which it is possible to search through all possible rules since we can exploit constraints not available to other methods [3] and [12]. Further we deal with a broader set of beliefs (and not just characterizing a single concept) and do not assume that rules in the final set need to completely characterize the data as done in concept learning. The rest of this paper is structured as follows. In Section 2, we provide, for completeness, some preliminaries including an overview of association rules and unexpected pattern discovery. Section 3 provides a technical motivation for knowledge refinement. In Section 4, we present the generic refinement strategy. Section 5 discusses properties of refinement algorithms and Section 6 presents and compares two specific refinement algorithms. Conclusions are presented in Section 7.

نتیجه گیری انگلیسی

The generic refinement strategy presented in Fig. 1 has 3 df: the pattern generation procedure, the selection procedure and the refinement procedure. Given that a fixed pattern generation procedure (MinZoominUR) and a fixed refinement procedure (NM_Refine) were chosen for their strengths described in Section 4, we presented two refinement algorithms that represented extremes in the selection procedure—IterateRUB selected only the best pattern to incorporate each time while IterateRUA selected all. Clearly the generic refinement strategy (Fig. 1) can be used in many other refinement algorithms that select some subset of generated patterns each time. However, since IterateRUB was shown to have all the good properties presented in this paper we believe it is an “optimal” algorithm with respect to satisfying all the objective functions or properties that were chosen. A globally “best” refinement algorithm needs clear specification of what “best” should be and in general there may be several other properties that are useful (such as convergence in a fixed number of iterations, the size of the fixpoint, predictive accuracy). The approach adopted here selected five good properties of refinement algorithms to make inferences on their relative strengths. IterateRUB satisfied all these properties. IterateRUA however does not score well in consistency, minimality and monotonicity properties. However, notice that the reason IterateRUB generated consistent and minimal beliefs and had the monotonicity property was because at any iteration it generated mutually exclusive patterns (as shown in Proofs of Theorem 6.2, Theorem 6.4 and Theorem 6.5). IterateRUA can be modified such that at the end of each iteration the patterns can be made mutually exclusive by applying conflict resolution strategies. However, this is an expensive operation which involves comparison of each belief with the other beliefs at each iteration. Therefore, with respect to the properties presented here IterateRUB is preferable to IterateRUA. We also showed experimentally in a real world dataset that IterateRUB also converges to a much smaller fixpoint than does IterateRUA. As mentioned in the paper, we selected IterateRUA and IterateRUB for a detailed comparison in this paper since they represented extremes in the selection procedure. An interesting finding in this paper is that the method that uses all discovered patterns in refinement is inferior to the one that just uses the best with respect to the criteria considered. In a sense, this is not an intuitive result since it does not advocate using the entire set of discovered patterns in refinement. The work presented in this paper represents one approach to knowledge refinement in the specific context of unexpected association rules discovered from data. In general, as knowledge-driven data mining develops, additional work is needed to investigate new refinement strategies for other methods and to evaluate different comparison metrics. In this paper, we addressed the problem of incorporating the discovered contradictions into the belief system based on a formal logic approach. Specifically, we presented a framework for refinement based on a generic knowledge refinement strategy, described abstract properties of refinement algorithms that can be used to compare specific instantiations and then presented and compared two specific refinement algorithms based on this framework.