رویکرد داده کاوی کارآمد برای کشف دانش جالب از معاملات مشتری
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
9114 | 2006 | 8 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 30, Issue 4, May 2006, Pages 650–657
چکیده انگلیسی
Mining association rules and mining sequential patterns both are to discover customer purchasing behaviors from a transaction database, such that the quality of business decision can be improved. However, the size of the transaction database can be very large. It is very time consuming to find all the association rules and sequential patterns from a large database, and users may be only interested in some information. Moreover, the criteria of the discovered association rules and sequential patterns for the user requirements may not be the same. Many uninteresting information for the user requirements can be generated when traditional mining methods are applied. Hence, a data mining language needs to be provided such that users can query only interesting knowledge to them from a large database of customer transactions. In this paper, a data mining language is presented. From the data mining language, users can specify the interested items and the criteria of the association rules or sequential patterns to be discovered. Also, the efficient data mining techniques are proposed to extract the association rules and the sequential patterns according to the user requirements.
مقدمه انگلیسی
An association rule (Han and Pei, 2000) describes the association among items in which when some items are purchased in a transaction, others are purchased too. An association rule has the form X⇒Y, in which X and Y are two sets of items. In this paper, we refer to X as an antecedent and Y as a consequent of this rule. The length of an itemset i is the number of items in the itemset i, and an itemset of length k is called a k-itemset. A transaction tsupports an itemset i if i is contained in t. The support for an itemset i is defined as the ratio of the number of transactions that supports the itemset i to the total number of transactions. If the support for an itemset i satisfies the user-specified minimum support threshold, then i is called frequent itemset, and a frequent itemset of length k a frequent k-itemset. The confidence of a rule X⇒Y is defined as the ratio of the support for the itemsets X∪Y to the support for the itemset X. If itemset Z=X∪Y is a frequent itemset and the confidence of X⇒Y is no less than the user-specified minimum confidence, then the rule X⇒Y is an association rule. Mining sequential patterns (Pie et al., 2001) is to find the sequential purchasing behavior for most customers from a large transaction database. A sequence is an ordered list of the itemsets〈s1, s2,…, sn〉, where si is a set of items. A customer sequence is the list of all the transactions of a customer, which is ordered by increasing transaction-time. A customer sequence csupports a sequence s if s is contained in c. The support for a sequence s is defined as the ratio of the number of customer sequences that supports s to the total number of customer sequences. If the support for a sequence s satisfies the user-specified minimum support threshold, then s is called frequent sequence. The length of a sequence s is the number of itemsets in the sequence. A sequence of length k is called a k-sequence, and a frequent sequence of length k a frequent k-sequence. A sequential pattern is a frequent sequence that is not contained in any other frequent sequence. In this paper, we present a data mining language, from which users only need to specify the criteria and the interested items for discovering the association rules and sequential patterns. We also propose efficient data mining algorithms for the data mining language processing. For the data mining algorithms, we focus on discovering the associations among interested items and all the other items. For our data mining system, a user can make a query through our query language, and the system answers to the query according to user specified items and criteria immediately. If the answers do not satisfy user's needs, then user can resubmit his/her query by adjusting the criteria and item constraints. Many constraint-based mining methods have been proposed. Hipp and Guntzer (2002) presented that data mining process should be an initial unconstrained and costly mining run. The mining queries are answered from the initial mining result such that response time can be minimized. However, the discovered association rules may become invalid or inappropriate since the transactions are increasing any time. It is very costly to re-run the unconstrained mining algorithm to obtain the up-to-date initial mining result. Ng, Lakshmanan, Han, and Mah (1999) considered aggregate constraints and item constraints for mining association rules. For item constraints, the items in the discovered frequent itemset must exactly be contained in the specified items. Pei and Han (Ng et al., 1999 and Jian Pei, 2000) developed pattern-growth methods for constrained frequent pattern mining and sequential pattern mining. An item constraint specifies what is the particular individual or group of items that should or should not be presented in the pattern, that is, the items in the discovered patterns have to be contained in the specified itemset. In (Pei et al., 2002), they discussed about mining sequential patterns with regular expression, the items in the discovered patterns must appear in the sequence defined in the regular expression. All the above approaches cannot discover the associations among certain items and all the other items. Hence, the item constraints in the above approaches are different from our work. Meo, Psaila and Ceri (1996) proposed a SQL-like operator for extracting association rules. However, SQL-like operator cannot completely express the associations among certain items and all the other items. Furthermore, the SQL-like operator performs set-oriented operations (i.e. join operations), which are very inefficient operations. Yen and Chen (1997) proposed a data mining language for mining interesting association rules. They presented a user-friendly mining language and users can specify the interested items and the criteria of the rules to be discovered. This approach constructs an association graph and generates all the frequent itemsets by traveling the association graph. However, it needs to take a lot of memory space to record the related information. In this paper, we successfully integrate two kinds of patterns and use the similar style of the data mining language proposed in (Yen and Chen, 1997). Besides, we also propose efficient data mining algorithms to find all the associations among certain items and all the other items.
نتیجه گیری انگلیسی
In this paper, we introduce a data mining language. From the data mining language, users can specify the interested items or the sequences, and the minimum support and the minimum confidence threshold to discover association rules and sequential patterns. We propose the efficient data mining algorithms MIAR and MISP to process the user requirements. Our algorithms can reduce the number of the combinations of itemsets or sequences in each customer sequence for counting the supports of the candidates, and reduce the number of the candidates according to the user's requests. In order to improve the efficiency, we generate bit-string database and itemset (sequence) databases and propose a sequential bit-string operation for counting the supports of the candidates by easily performing logical bit operations. Although the bit-string database and itemset (sequence) database cost extra memory space, it is more important to reduce the response time for a data mining query system.