ترجمه فارسی عنوان مقاله

یک روش جدید برای رتبه بندی قوانین کشف شده از داده کاوی توسط DEA

عنوان انگلیسی

A new method for ranking discovered rules from data mining by DEA

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
22157	2009	6 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Expert Systems with Applications, Volume 36, Issue 4, May 2009, Pages 8503–8508

ترجمه کلمات کلیدی

داده کاوی - تحلیل پوششی داده ها - قاعده اتحادیه -

کلمات کلیدی انگلیسی

Data mining, Data envelopment analysis, Association rule,

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

Data mining techniques, extracting patterns from large databases have become widespread in business. Using these techniques, various rules may be obtained and only a small number of these rules may be selected for implementation due, at least in part, to limitations of budget and resources. Evaluating and ranking the interestingness or usefulness of association rules is important in data mining. This paper proposes a new integrated data envelopment analysis (DEA) model which is able to find most efficient association rule by solving only one mixed integer linear programming (MILP). Then, utilizing this model, a new method for prioritizing association rules by considering multiple criteria is proposed. As an advantage, the proposed method is computationally more efficient than previous works. Using an example of market basket analysis, applicability of our DEA based method for measuring the efficiency of association rules with multiple criteria is illustrated

مقدمه انگلیسی

With the rapid growth of databases in many modern enterprises data mining has become an increasingly important approach for data analysis. In recent years, the field of data mining has seen an explosion of interest from both academia and industry (Olafson, Li, & Wu, 2008). Increasing volume of data, increasing awareness of inadequacy of human brain to process data and increasing affordability of machine learning are reasons of growing popularity of data mining (Marakas, 2004). One of the main objectives of data mining is to produce interesting rules with respect to some user’s point of view. This user is not assumed to be a data mining expert, but rather an expert in the field being mined (Lenca, Meyer, Vaillant, & Lallich, 2008). The problem of discovering association rules has received considerable research attention and several fast algorithms for mining association rules have been developed (Srikant, Vu, & Agrawal, 1997). Using these techniques, various rules may be obtained and only a small number of these rules may be selected for implementation due, at least in part, to limitations of budget and resources (Chen, 2007). According to Liu, Hsu, Chen, and Ma (2000) the interestingness issue has long been identified as an important problem in data mining. It refers to finding rules that are interesting/useful to the user, not just any possible rule. Indeed, there exist some situations that make necessary the prioritization of rules for selecting and concentrating on more valuable rules due to the number of qualified rules (Tan & Kumar, 2000) and limited business resources (Choi, Ahn, & Kim, 2005). According to Chen (2007), selecting the more valuable rules for implementation increases the possibility of success in data mining. For example, in market basket analysis, understanding which products are usually bought together by customers and how the cross-selling promotions are beneficial to sellers both attract marketing analysts. The former makes sellers to provide appropriate products by considering the customers’ preferences, and the later allows sellers to gain increased profits by considering the sellers’ profits. Customers’ preferences can be measured based on support and confidence in association rules. On the other hand, seller profits can be assessed using domain related measures such as sale profit and cross-selling profit associated with the association rules (Chen, 2007). In previous studies dealing with the discovery of subjectively interesting association rules, most approaches require manual input or interaction by asking users to explicitly distinguish between interesting and uninteresting rules (Chen, 2007). Srikant et al. (1997) presented three integrated algorithms for mining association rules with item constraint. Moreover, Lakshmanan et al. (1998) extended the approach presented by Srikant et al. to consider much more complicated constraints, including domain, class, and SQL-style aggregate constraints. Liu et al. (2000) presents an Interestingness Analysis System (IAS) to help the user identify interesting association rules. In their proposed method, they consider two main subjective interestingness measures, unexpectedness and actionability. Choi et al. (2005), using analytic hierarchy process (AHP) presented a method for association rules prioritization which considers the business values which are comprised of objective metric or managers’ subjective judgments. They believed that proposed method makes synergy with decision analysis techniques for solving problems in the domain of data mining. Nevertheless this method requires large number of human interaction to obtain weights of criteria by aggregating the opinions of various managers. Chen (2007) developed their work and proposed a data envelopment analysis (DEA) based methodology for ranking association rules while considering multiple criteria. During his ranking procedure, he uses a DEA model, proposed by Cook and Kress (1990), to identify efficient association rules. Then, he applies another DEA model, developed by Obata and Ishii (2003), to discriminate efficient association rules. It should be noted that his proposed method requires the first model to be solved for all DMUs and the second model to be solved for efficient DMUs. As a drawback, this approach requires considerable number of linear programming (LP) models to be solved. Moreover, this approach includes some redundant computations and considerations. Therefore there is a need for a method which is able to rank association rules more efficiently. This paper tries to fill the gap by developing a new integrated DEA model which is able to identify most efficient association rule by solving only one mixed integer linear programming (MILP) and proposing a new method for ranking association rules with multiple criteria. The proposed method is computationally efficient and helps user to get fast results. DEA is a non-parametric linear programming based technique for measuring the relative efficiency of a set of similar units, usually referred to as decision making units (DMUs). Because of its successful application and case studies, DEA has gained too much attention and widespread use by business and academy researchers. Evaluation of data warehouse operations (Mannino, Hong, & Choi, 2008), selection of flexible manufacturing system (Liu, 2008), assessment of bank branch performance (Camanho & Dyson, 2005), examining bank efficiency (Chen, Skully, & Brown, 2005), analyzing firm’s financial statements (Edirisinghe & Zhang, 2007), measuring the efficiency of higher education institutions (Johnes, 2006), solving facility layout design (FLD) problem (Ertay, Ruan, & Tuzkaya, 2006) and measuring the efficiency of organizational investments in information technology (Shafer & Byrd, 2000) are examples of using DEA in various areas. Similar to Chen (2007), this paper uses DEA as a post-processing approach. After the rules have been discovered from the association rule mining algorithms, DEA is used to rank those discovered rules based on the specified criteria. The main contribution of this paper is to develop a new integrated DEA model for finding most efficient association rule (by solving only one LP) and to propose a new method for ranking discovered association rules of data mining. The rest of this paper is organized as follows. In section 2, briefly, association rule is described. Section 3, presents DEA models and section 4 discuss a previous method for ranking association rules. Our proposed method is introduced in section 5. Then, applicability of our method is illustrated in section 6. The paper closes with some concluding remarks in section 7.

نتیجه گیری انگلیسی

Data mining popularity is growing at a lightning-fast pace. Using these techniques, various rules may be obtained and only a small number of these rules may be selected for implementation due, at least in part, to limitations of budget and resources. In this paper, we developed a new integrated DEA model which is able to identify most CCR-efficient DMU by considering only outputs data of them, without any input. This model is applicable for finding most efficient association rule. Consequently, by utilizing proposed model, we introduced a new method for ranking association rules with multiple criteria. In comparison to previous works, our method is computationally efficient and also ranks all association rules.