یک الگوریتم کارا برای استخراج الگوهای درون معامله ای مکرر
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
9148 | 2007 | 24 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Information Sciences, Volume 177, Issue 17, 1 September 2007, Pages 3453–3476
چکیده انگلیسی
In this paper, we propose an efficient method for mining all frequent inter-transaction patterns. The method consists of two phases. First, we devise two data structures: a dat-list, which stores the item information used to find frequent inter-transaction patterns; and an ITP-tree, which stores the discovered frequent inter-transaction patterns. In the second phase, we apply an algorithm, called ITP-Miner (Inter-Transaction Patterns Miner), to mine all frequent inter-transaction patterns. By using the ITP-tree, the algorithm requires only one database scan and can localize joining, pruning, and support counting to a small number of dat-lists. The experiment results show that the ITP-Miner algorithm outperforms the FITI (First Intra Then Inter) algorithm by one order of magnitude.
مقدمه انگلیسی
Association rule mining is an important problem in the data mining field [1], [2], [4], [5], [8], [10], [11], [12], [13], [15], [16], [20], [22], [25] and [27]. Traditional association analysis is intra-transactional because it focuses on association relationships among itemsets within a transaction. For example, an intra-transaction association rule might be: “If the stock prices of Microsoft and IBM go up, the price of Intel is likely to go up on the same day.” However, intra-transactional approaches cannot capture a rule like: “If the stock prices of Microsoft and IBM go up, the price of Intel is likely to go up two days later.” Inter-transaction association rule mining [6], [7], [14], [17], [18], [23] and [24] extends the association rules to describe association relationships among itemsets across several transactions. Many algorithms for mining inter-transaction association rules have been proposed. Lu et al. [17] and Feng et al. [6] applied inter-transaction association rule mining algorithms to the prediction of trends in meteorological and stock market data. Lu et al. [17] and [18] proposed the EH-Apriori algorithm, which uses the Apriori algorithm to discover frequent inter-transaction itemsets. To enhance their algorithm’s efficiency, the authors used a hashing technique to reduce the number of candidate itemsets of length two. Feng et al. [7] used templates to reduce the number of rules. More recently, Tung et al. [23] and [24] developed an algorithm, called FITI (First Intra Then Inter), which discovers frequent intra-transaction itemsets and uses them to generate frequent inter-transaction itemsets. All the algorithms for mining inter-transaction association rules developed thus far have been based on Apriori-like breadth-first search (BFS) approaches that search for frequent itemsets level by level. At each level, a database must be scanned once to determine the support for each candidate itemset. It has been shown that Apriori-like approaches [19] and [22] perform well in finding frequent intra-transaction itemsets when the itemsets are short. However, when mining long frequent itemsets, or using very small support thresholds, the performance of such algorithms often deteriorates dramatically. The reason is that a frequent itemset of length k implies the presence of 2k − 2 additional frequent sub itemsets, each of which must be examined. Moreover, since Apriori-like approaches may generate a large number of candidate patterns at each level, they are prone to memory shortage during the mining process. We observe that Apriori-like methods for finding frequent inter-transaction itemsets have the same drawbacks as those for finding frequent intra-transaction itemsets. Therefore, in this paper, we propose an efficient method for mining a complete set of frequent inter-transaction itemsets (patterns). The method consists of two phases. First, we find all frequent items. For each frequent item found, we construct a dat-list that records the item information required for finding the frequent inter-transaction patterns. Then, we devise a data structure, called an ITP-tree, to store the discovered frequent inter-transaction patterns. In the second phase, we propose an algorithm, called ITP-Miner, to efficiently find all frequent inter-transaction patterns in a depth-first search manner. By using the ITP-tree and dat-lists to mine the frequent inter-transaction patterns, the ITP-Miner algorithm requires only one database scan and can localize joining, pruning, and support counting to a small number of dat-lists. Therefore, it is more efficient than the FITI algorithm. The remainder of this paper is organized as follows. In Section 2, we describe the problem of mining frequent inter-transaction patterns. Section 3 introduces our proposed algorithm, ITP-Miner, for mining frequent inter-transaction patterns. We explain the basic concept of the algorithm and give an example to illustrate how it works. Section 4 describes the performance evaluation. Finally, we present the conclusions in Section 5.
نتیجه گیری انگلیسی
In this paper, we have proposed an efficient method for mining all frequent inter-transaction patterns. The method consists of two phases. First, we design a data structure, called a dat-list, to record the dimensional attribute information of a frequent item in a database. Then, we devise a data structure, called an ITP-tree, to store the frequent patterns found. Second, we propose an efficient algorithm, called ITP-Miner, to mine the frequent patterns in a depth-first search manner. By using the ITP-tree and dat-lists to mine frequent patterns, our proposed algorithm only requires one database scan and can localize joining, pruning, and support counting to a small number of dat-lists. Therefore, our proposed algorithm is more efficient than the FITI algorithm. The experiment results show that ITP-Miner outperforms the FITI algorithm by one order of magnitude and requires less main memory storage space. Although we have shown that the ITP-Miner algorithm can efficiently mine frequent inter-transaction patterns, there are still some issues to be addressed in future research. First, we may also extend the algorithm from 1-dimensional transaction databases to higher dimension databases. Second, without generalization, too many patterns may be mined and they may be too detailed. However, by generalizing with a concept hierarchy, we may be able to obtain patterns or rules that are more abstract and meaningful.