چارچوب یک مدل فرایند داده کاوی آگاه از زمینه برای حمایت از ارزیابی نتایج داده کاوی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|22245||2012||9 صفحه PDF||سفارش دهید||5598 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 39, Issue 1, January 2012, Pages 1156–1164
The knowledge discovery via data mining process (KDDM) is a multiple phase that aims to at a minimum semi-automatically extract new knowledge from existing datasets. For many data mining tasks, the evaluation phase is a challenging one for various reasons. Given this challenge several studies have presented techniques that could be used for the semi-automated evaluation of data mining results. When taken together, these studies suggest the possibility of a common multi-criteria evaluation framework. The use of such a multi-criteria evaluation framework, however, requires that relevant objectives, measures and preference function be identified. This implies that the context of the DM problem is particularly important for the evaluation phase of the KDDM process. Our framework utilizes and integrates a pair of established tightly coupled techniques (i.e. Value Focused Thinking (VFT) and the Goal–Question–Metric (GQM) methods) as well as established techniques from multi-criteria decision analysis in order to explicate and utilize context information in order to facilitate semi-automated evaluation.
The knowledge discovery via data mining process (KDDM) is a multiple phase process (see Fig. 1) that aims to, at a minimum, semi-automatically extract new knowledge from existing data sets. This process that has been described in various ways (e.g. Cios et al., 2000 and Shearer, 2000) but essentially consists of the following steps: Business (or Application Domain) Understanding (which includes definition of business and data mining goals), Data Understanding, Data Preparation, Data Mining (or Modeling), Evaluation (e.g. evaluation of results based on Data Mining goals), and Deployment (Kurgan & Musilek, 2006). CRISP-DM (cross-industry standard procedure for data mining), the most popular of the KDDM process model was developed by multi-industry collective of practitioners after the practitioner community became aware of the need for formal data mining process models that prescribe the journey from data to discovering knowledge. The original model was further extended by researchers (e.g. Cios et al., 2000 and Sharma and Osei-Bryson, 2010).For many data mining tasks, the evaluation phase is a challenging one for various reasons. For example, with regard to decision tree (DT) induction although the performance measures may be clear (e.g. accuracy, simplicity, lift), challenges include the need to evaluate a large number of DTs. Gersten, Wirth, & Arundt (2000) noted that regards to setting parameter values, there is “no practicable approach to select … the most promising combinations early in the process” and as such “it is necessary to experiment with different combinations” but “it is very hard to compare that many models and pick the optimal one reliably”. Given this challenge Osei-Bryson (2004) proposed an approach for comparing and selecting the ‘optimal’ decision tree (DT) model given preference and value functions specified by the domain expert(s). Choi et al., 2005 and Chen, 2007 presented approaches for prioritizing association rules. Osei-Bryson, 2005 and Osei-Bryson, 2010 also presented approaches for selecting the most appropriate segmentation. Overall these papers describe techniques that could be used for the semi-automated evaluation of data mining results. When taken together, these papers suggests the possibility of a common context-aware multi-criteria framework for evaluating the results of data mining that accommodates multiple performance measures, supports adequate data mining experimentation and the non-burdensome semi-automated evaluation of results from the application of data mining techniques. The use of such a multi-criteria evaluation framework, however, requires that relevant objectives, measures and preference function be identified. This implies that the context of the DM problem is particularly important for the evaluation phase of the KDDM process. The Stakeholders, Business Objectives, Data Mining Objectives and associated performance measures, and the preference function are the major important elements of the context of the particularly DM problem, with the stakeholders’ perspectives being a major factor for determining the other elements. Given the identification and definition of the objectives, associated measures and preference function, then a multi-criteria approach could be used to automatically determine the ranking of the data mining results during the evaluation phase. Several studies including Osei-Bryson, 2004, Osei-Bryson, 2005, Osei-Bryson, 2007, Osei-Bryson, 2008, Osei-Bryson, 2010, Choi et al., 2005 and Chen, 2007 have offered this type of context-aware multi-criteria approach for post-processing. However, apart from Osei-Bryson (2010), the solution methods of those studies were not explicitly situated within the context of KDDM process models and none (including Osei-Bryson, 2010) described how the implications of a given problem context could be explicated in a manner that would facilitate the evaluation of DM output. As noted by Kurgan and Musilek (2006), with regards to data mining “Before any attempt can be made to perform the extraction of this useful knowledge, an overall approach that describes how to extract knowledge needs to be established”. In this paper we present a KDDM process model based common context-aware multi-criteria framework for evaluating data mining results that includes the explication of business and data mining objectives and performance measures. Our research problem can be considered to involve context-aware support for the selection of a limited set of the ‘best’ models ( Zopounidis & Doumpos, 2002) in order to reduce the cognitive burden on the domain experts in the evaluation phase of the KDDM process.
نتیجه گیری انگلیسی
The knowledge discovery via data mining (KDDM) process is a multiple phase process that includes the Business Understanding, Data Mining, and Evaluation (e.g. evaluation of results based on DM goals), and Deployment (Kurgan & Musilek, 2006). While modern commercial data mining software simplifies the execution of the Data Mining phase, for many data mining problem types and instances the evaluation phase can be challenging one for various reasons. KDDM process models can however be extended in a manner that reduces this challenge. In this paper we have presented a formal approach for the context-based evaluation by domain expert(s) of DM results that is based on data mining process models such as CRISP-DM. Our framework utilizes and integrates a pair of established tightly coupled techniques (i.e. Value Focused Thinking (VFT) and the Goal–Question–Metric (GQM)) as well as established techniques from multi-criteria decision analysis in order to explicate and utilize context information. This framework offers a semi-automated multi-step multi-criteria based decision support process that facilitates the domain expert’s efforts for selecting the ‘best’ model in a manner that is not cognitively burdensome while being consistent with the specified data mining goals.