دانلود مقاله ISI انگلیسی شماره 22242
ترجمه فارسی عنوان مقاله

داده کاوی زمانی با الگوی درختان به روزشده

عنوان انگلیسی
Temporal data mining with up-to-date pattern trees
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
22242 2011 8 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Expert Systems with Applications, Volume 38, Issue 12, November–December 2011, Pages 15143–15150

ترجمه کلمات کلیدی
داده کاوی - داده کاوی زمانی - الگوی به روز شده - درخت - رشد
کلمات کلیدی انگلیسی
Data mining, Temporal data mining, Up-to-date pattern, UDP-tree, UDP-growth
پیش نمایش مقاله
پیش نمایش مقاله  داده کاوی زمانی با  الگوی درختان به روزشده

چکیده انگلیسی

Mining interesting and useful frequent patterns from large databases attracts much attention in recent years. Among the mining approaches, finding temporal patterns and regularities is very important due to its practicality. In the past, Hong et al. proposed the up-to-date patterns, which were frequent within their up-to-date lifetime. Formally, an up-to-date pattern is a pair with the itemset and its valid corresponding lifetime in which the user-defined minimum support threshold must be satisfied. They also proposed an Apriori-like approach to find the up-to-date patterns. This paper thus proposes the up-to-date pattern tree (UDP tree) to keep the up-to-date 1-patterns in a tree structure for reducing database scan. It is similar to the FP-tree structure but more complex due to the requirement of up-to-date patterns. The UDP-growth mining approach is also designed to find the up-to-date patterns from the UDP tree. The experimental results show that the proposed approach has a better performance than the level-wise mining algorithm.

مقدمه انگلیسی

Knowledge discovery in databases (KDD) is to identify efficient and helpful information from large databases and provide automated analysis and solutions. It has attracted a significant amount of research. The approaches may be classified as working on transaction databases, temporal databases, relational databases, multimedia databases, and so on. In particular, finding association rules from transaction databases is most commonly seen in data mining (Agrawal et al., 1993a, Agrawal et al., 1993b, Agrawal and Srikant, 1994, Agrawal et al., 1997, Chen et al., 1996 and Cheung et al., 1996; Mannila et al., 1994 and Srikant and Agrawal, 1995). In the past, many algorithms for mining association rules from transactions were proposed, most of which were based on the Apriori algorithm (Agrawal et al., 1993a). It was proposed to discover the correlation relationships among items or itemsets in transactional databases and has been applied in many areas. When the percentage of transactions containing a candidate itemset is greater than or equal to a pre-defined minimum support threshold, the itemset is considered as a frequent itemset. That is, some correlation relationships exist among the items it includes. The number of derived frequent itemsets is greatly influenced by the pre-defined minimum support threshold. Han et al. thus proposed the Frequent-Pattern-tree (FP-tree) structure for efficiently deriving the frequent itemsets without candidate generation (Han, Pei, & Yin, 2000). Only the frequent items were kept and composed the tree structure, which was thus condensed. A recursive mining procedure called FP-growth was executed to derive frequent patterns from the FP tree constructed (Han et al., 2000). Han et al. showed the approach had a better performance than Apriori-like approaches. Recently, temporal data mining has been considered as an important topic attracting many researchers. Analyzing temporal data and discovering temporal patterns are the main concerns in temporal data mining. It typically shows time-related correlations of itemsets from transactions. For instance, the sales of ice cream in summer and of mittens in winter should be higher than those in the other seasons. Some approaches for finding seasonal behaviors of specific items or itemsets where thus proposed (Roddick & Spiliopoulou, 2002). In addition to finding seasonal behaviors, there are many other kinds of knowledge in temporal data mining (Ale and Rossi, 2000, Chen et al., 1998, Li and Deogun, 2005, Li et al., 2003 and Lee et al., 2002; Ozden et al., 1998 and Verma et al., 2005). In the past, Hong et al. proposed a concept of up-to-date patterns, which were frequent within their up-to-date lifetime (Hong, Wu, & Wang, 2009). Formally, an up-to-date pattern is a pair with the itemset and its valid corresponding lifetime, presented as ({itemset}, 〈lifetime〉). The end value of the lifetime is the current time and no other lifetime for the itemset may last longer than it. Note that an itemset not frequent for the entire database may be a frequent up-to-date pattern since its items seldom occurring early may constantly occur lately. Hong et al.. also proposed an algorithm to derive up-to-date patterns from transactions in a level-wise process ( Hong et al., 2009). In this paper, we attempt to derive the up-to-date patterns without the Apriori-like generation of candidates. An up-to-date pattern tree (UDP tree) is first designed to keep the derived frequent up-to-date 1-patterns. It is similar to the FP-tree structure except that the corresponding transaction identifications (TIDs) are also kept. The up-to-date 1-patterns with their frequency and their valid lifetime are retained in the Header_Table as well. An UDP-growth mining approach is then proposed to derive the up-to-date patterns from the UDP tree. Experimental results also show that the proposed approach for mining up-to-date patterns has a better performance than the Apriori-like up-to-date algorithm (Hong et al., 2009) in the execution time and the number of generated candidates. The remainder of this paper is organized as follows. Related works are reviewed in Section 2. The proposed UDP-tree construction algorithm and an example are described in Section 3. The UDP-growth mining algorithm and an example are stated in Section 4. Experimental results for showing the performance of the proposed algorithms are provided in Section 5. Conclusions are finally given in Section 6.

نتیجه گیری انگلیسی

In traditional data mining approaches, frequent itemsets are valid only for an entire database. That is, all the transactions in a database are considered to derive the mined rules, but not always the anticipated ones. A database may grow huge over time and time, and a decision made on recent data should be more significant than that on the whole set of data. Although some approaches based on sliding windows have been proposed, only the most recent items in a fixed length are concerned. Hong et al. thus proposed the up-to-date patterns to avoid the problem of a fixed window size. In this paper, we have further designed the UDP tree, which preserved the up-to-date 1-patterns, to help mine up-to-date patterns efficiently. We have also proposed the UDP-growth mining algorithm based on the UDP tree structure to derive the up-to-date patterns easily. Experimental results show that the proposed approach for mining up-to-date patterns has a better performance than the Apriori-like up-to-date algorithm in both the execution time and the number of generated candidates. In real applications, transactions may be frequently inserted into, deleted or modified from a database. In the future, we thus try to maintain the up-to-date patterns efficiently and effectively when a database changes rapidly. Using other appropriate models to speed up the execution time of an updated database will be investigated as well.