پالایش قاعده پویا در سیستم های داده کاوی مبتنی بر دانش
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|22025||2001||18 صفحه PDF||سفارش دهید||9168 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Decision Support Systems, Volume 31, Issue 2, June 2001, Pages 205–222
The availability of relatively inexpensive computing power as well as the ability to obtain, store, and retrieve huge amounts of data has spurred interest in data mining. In a majority of data mining applications, most of the effort is spent in cleaning the data and extracting useful patterns in the data. However, a critical step in refining the extracted knowledge especially in dynamic environments is often overlooked. This paper focuses on knowledge refinement, a necessary process to obtain and maintain current knowledge in the domain of interest. The process of knowledge refinement is necessary not only to have accurate and effective knowledge bases but also to dynamically adapt to changes. KREFS, a knowledge refinement system, is presented and evaluated in this paper. KREFS refines knowledge by intelligently self-guiding the generation of new training examples. Avoiding typical problems associated with dependency on domain knowledge, KREFS identifies and learns distinct concepts from scratch. In addition to improving upon features of existing knowledge refinement systems, KREFS provides a general framework for knowledge refinement. Compared to other knowledge refinement systems, KREFS is shown to have more expressive power that renders its applicability in more realistic applications involving the management of knowledge.
The decreasing cost of computing, the ease of collecting and storing data, advances in DBMS technologies, as well as the extensive set of available analytical tools have been instrumental in generating interest in data mining applications. In addition to traditional top-down data analyses including well-founded application queries and report generation, bottom-up discovery-driven data analyses have been gaining popularity. Both individuals as well as organizations are beginning to explore possible ways to extract useful patterns that may be present in data to facilitate faster, more accurate, and better business decisions. Given the strategic advantage in data mining as a valuable decision support tool, the projected growth of data mining applications in a short period of time is not surprising. The Meta Group has estimated the market for data mining applications to reach US$8.1 billion by the year 2001 (Financial Post, October 1999). In a majority of data mining applications, a commonly known estimate is that between 70% and 80% of the resources are spent on pre-processing the data . This includes integrating existing sources of data, supplementing the existing data with other necessary data, selecting the relevant data, preparing the data including data conversions, forming new attributes, as well as means to handle noisy, incomplete, duplicate, or missing data. Only the remaining small percentage is used for actually discovering patterns in the data. After all this, only a small fraction of the supposedly useful information discovered from the data are useful and actionable in reality. This is further exacerbated by the dynamic nature of most real-world environments that results in obsolescence of extracted knowledge. This necessitates careful examination and refinement of extracted knowledge over time. Thus, the process of knowledge refinement is necessary to maintain accurate, effective, and useful knowledge base that is dynamically updated as per changes in the environment. Knowledge refinement is especially critical in maintaining accurate and robust knowledge in a dynamic environment. Notable knowledge refinement systems include SEEK , SEEK2 , FOIL , GOLEM , and KBANN . Most of these systems are customized for specific applications, and application to other domains is difficult in general. The specific details of the strengths and limitations of these systems are discussed in Section 3. We develop a general knowledge refining system, KREFS, to overcome limitations identified in existing systems. In the next section, we provide a brief overview of data mining. In Section 3, we discuss the strengths and weaknesses of existing knowledge refinement systems. An overview of KREFS, the proposed knowledge refinement system, is provided in Section 4. Comparative analysis of KREFS' relative performance over an existing knowledge refinement system is presented in Section 5. A real-world bankruptcy prediction data is used to illustrate the performance of KREFS in Section 6. Section 7 concludes this paper with a brief discussion.
نتیجه گیری انگلیسی
Data mining systems are becoming a necessity to extract useful information from huge amounts of archived data in order to achieve competitive advantage. This study focused on knowledge refinement, which is a rather neglected component of data mining systems. Without knowledge refinement, ‘useful’ information extracted by data mining systems at an earlier point in time is bound to become stale. The importance of a knowledge refinement component in data mining systems cannot be understated. In this paper, we have conducted a comparative study on knowledge refinement systems including the proposed KREFS. The notable refinement systems include SEEK, SEEK2, FOIL, GOLEM, and KBANN. KREFS has more expressive power compared to SEEK, SEEK2, and KBANN, which are propositional value representation systems. This advantage enables KREFS to be widely applicable in more realistic applications. As opposed to KBANN's limitation in further refinement when all training examples are exhausted, KREFS keeps refining its knowledge by intelligently self-guiding the generation of new training examples. Typical problems of dependency on domain knowledge that is inherent in systems like FOIL or GOLEM are not found in KREFS. Rather, KREFS can identify the distinct concept from scratch. The flexibility, wide applicability, and more accurate knowledge refinement process of KREFS are illustrated in this study. One of the limitations of KREFS is that it is based on the premise that training examples with desired characteristics as per the dictates of the algorithm will be available. Although this seems like a hard constraint, the characteristics desired are broad enough to allow for ease of obtaining necessary training examples. Nevertheless, this still remains a limitation. A possible extension to KREFS would be to take into account objective(s) specified by the user while refining the rules. For example, the user can request KREFS to generate rules that are parsimonious.