استفاده از یک رویکرد مبتنی بر طرح ریزی برای استخراج الگوهای معامله درونی مکرر
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
9348 | 2013 | 8 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 38, Issue 9, September 2011, Pages 11024-11031
چکیده انگلیسی
In this paper, we propose an algorithm called PITP-Miner that utilizes a projection based approach to mine frequent inter-transaction patterns efficiently. The algorithm only searches for local frequent items in a projected database that stores potential local inter-transaction items and partitions the database into a set of smaller databases recursively. In addition, two pruning strategies are designed to further condense the partitioned databases and thus accelerate the algorithm. Our experiment results demonstrate that the proposed PITP-Miner algorithm outperforms the ITP-Miner and FITI algorithms in most cases.
مقدمه انگلیسی
Mining association rules from large databases have received considerable attention in recent years and numerous studies have been conducted (Agrawal et al., 1993, Han and Kamber, 2006, Lee et al., 2006 and Lee et al., 2007). Traditional association rule mining methods are intra-transactional in nature, so they only consider associations that occur within the same transaction. In contrast, an inter-transaction association rule can represent the associations of items within a transaction, as well as the associations of items across different transactions along a certain dimension (Lu, Feng, & Han, 2000). For example, an intra-transaction association rule might state: “When the prices of Intel and IBM go up, 85% of the time, the price of SUN will increase on the same day”. However, an inter-transaction can extend such a rule to: “When the prices of Intel and IBM go up, 85% of the time, the price of SUN will increase two days later”. Lu, Han, and Feng (1998) first used inter-transaction association rules to predict stock market movements, and Feng, Dillon, and Liu (2001) applied such rules to meteorological data study. Subsequently, Li, Feng, and Wong (2005) extended inter-transaction association rules to a more general form of association rules, called generalized multidimensional inter-transaction association rules. Several algorithms have been proposed for mining frequent inter-transaction patterns from large databases. For example, Lu et al. (2000) introduced the E/EH-Apriori (Extended/Extended Hash-based Apriori) algorithm, in which E-Apriori uses the Apriori property to discover frequent inter-transaction patterns and EH-Apriori uses a hashing technique to prune the number of candidates of length-2. Feng, Yu, Lu, and Han (2002) described a constraint-based inter-transaction association rules mining approach that uses several optimization techniques to enhance the EH-Apriori algorithm under rule templates. The following year, Tung, Lu, Han, and Feng (2003) proposed the FITI (First-Intra-Then-Inter) algorithm, which is implemented in two phases. First, the Apriori algorithm (Agrawal & Srikant, 1994) is executed to discover frequent intra-transaction patterns, which are then used to generate frequent inter-transaction patterns in the second phase. More recently, Lee and Wang (2007) presented an algorithm called ITP-Miner, which uses vertical data structure dat-lists and an ITP-tree to mine all frequent inter-transaction patterns in a depth-first search manner. It has been shown that ITP-Miner outperforms previous inter-transaction mining algorithms. The task of mining all frequent inter-transaction patterns in very large databases is quite challenging because the transaction boundaries are broken, and the search space increases exponentially when some of the mining parameters change. In this paper, we propose a projection-based approach called the PITP-Miner algorithm for efficient mining of frequent inter-transaction patterns in a large transaction database. The approach is based on a divide-and-conquer, pattern-growth principle, which means that the algorithm searches along a structure called a PITP-tree in a depth-first search manner. In a PITP-tree, each node represents a frequent inter-transaction pattern and an associated projected database that stores local potential inter-transaction items. In the algorithm, a projected database is recursively projected into a set of smaller projected sub-databases, after which frequent inter-transaction patterns are grown in each sub-database by searching only local frequent fragments. In addition, we introduce two pruning strategies, called ancestor node pruning and hash table pruning, to further condense the projected sub-databases and thus accelerate the algorithm. A performance study shows that, in most cases, the proposed PITP-Miner algorithm outperforms the ITP-Miner and FITI algorithms. The contribution of this paper is threefold. First, we propose an effective projection-based algorithm for mining frequent inter-transaction patterns efficiently. Second, we design two pruning strategies to reduce the search space and thus speed up the algorithm. Third, we demonstrate via experiments that the proposed algorithm outperforms the ITP-Miner and FITI algorithm in most cases. The remainder of this paper is organized as follows. Section 2 defines the frequent inter-transaction pattern mining problem and introduces some notations. In Section 3, we present the PITP-Miner algorithm. Section 4 describes the experiments and their performance results. Finally, we present our conclusions and indicate some future research directions in Section 5.
نتیجه گیری انگلیسی
The proposed projection-based PITP-Miner algorithm mines all frequent inter-transaction patterns so that a pattern can be used to describe associations across many different transactions. The algorithm employs a divide-and-conquer, pattern-growth strategy to search along a PITP-tree in a depth-first search manner. Since PITP-Miner localizes pattern extensions in projected databases and uses two pruning strategies to reduce the sizes of the projected databases, it is more efficient and scalable than the ITP-Miner and FITI algorithms. The experiment results demonstrate that the PITP-Miner algorithm outperforms the compared algorithms in most cases. Although we have shown that the PITP-Miner algorithm can mine frequent inter-transaction patterns efficiently, there are still some issues to be addressed in future research. First, we could extend the algorithm to mine generalized inter-transaction association rules (Li et al., 2005), which expand rule contexts from point-wise (e.g. two days later) to scope-wise contexts (e.g. within three days). We could also extend the algorithm from mining a transaction database with one domain attribute to mining multiple domain attributes. Finally, in this paper, we have focused on the inter-transaction mining problem where each transaction in a database contains an unordered itemset. In a future work, it would be interesting to extend PITP-Miner to mine inter-sequence patterns (Wang & Lee, 2009).