داده کاوی زبانشناختی با درخت های FP فازی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|22181||2010||8 صفحه PDF||سفارش دهید||5890 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 37, Issue 6, June 2010, Pages 4560–4567
Due to the increasing occurrence of very large databases, mining useful information and knowledge from transactions is evolving into an important research area. In the past, many algorithms were proposed for mining association rules, most of which were based on items with binary values. Transactions with quantitative values are, however, commonly seen in real-world applications. In this paper, the frequent fuzzy pattern tree (fuzzy FP-tree) is proposed for extracting frequent fuzzy itemsets from the transactions with quantitative values. When extending the FP-tree to handle fuzzy data, the processing becomes much more complex than the original since fuzzy intersection in each transaction has to be handled. The fuzzy FP-tree construction algorithm is thus designed, and the mining process based on the tree is presented. Experimental results on three different numbers of fuzzy regions also show the performance of the proposed approach.
Years of effort in data mining have produced a variety of efficient and effective techniques. Depending on the classes of the knowledge derived, the mining approaches may be classified as finding association rules, classification rules, clustering rules and sequential patterns (Agrawal & Srikant, 1995), among others. Especially, finding association rules in transaction databases is most commonly seen in data mining (Agrawal et al., 1993a, Agrawal et al., 1993b, Agrawal and Srikant, 1994, Chen et al., 1996 and Cheung et al., 1997). In the past, many algorithms for mining association rules from transactions were proposed. Most of the approaches were based on the Apriori algorithm ( Agrawal et al., 1993a), which generated and tested candidate itemsets level by level. This may cause iterative database scans and high computational costs. Han et al. thus proposed the Frequent-Pattern-tree (FP-tree) structure for efficiently mining association rules without generation of candidate itemsets ( Han, Pei, & Yin, 2000). The FP-tree was used to compress a database into a tree structure which stored only large items. It was condensed and complete for finding all the frequent patterns. The construction process was executed tuple by tuple, from the first transaction to the last one. After that, a recursive mining procedure called FP-growth was executed to derive frequent patterns from the FP-tree. In these years, the fuzzy-set theory (Zadeh, 1965) has been used more and more frequently in intelligent systems because of its simplicity and similarity to human reasoning (Kandel, 1992). Several fuzzy learning algorithms for inducing rules from given sets of data have been designed and used to good effect with specific domains (Hong and Chen, 1999 and Hong and Chen, 2000). As to fuzzy data mining, several approaches have been proposed. For example, Hong et al. proposed a fuzzy mining algorithm for managing quantitative data (Hong, Kuo, & Chi, 1999b). It was based on the Apriori algorithm. Basically, it first used membership functions to transform each quantitative value into a fuzzy set in linguistic terms. It then calculated the cardinality of each linguistic term on all the transaction data. The mining process based on the cardinalities was then performed to find linguistic frequent itemsets and association rules. Papadimitriou et al. proposed an approach based on FP-trees to find fuzzy association rules (Papadimitriou & Mavroudi, 2005). In their approach, each item in the transactions was transferred into only two fuzzy regions with individual fuzzy values. A threshold was set and a fuzzy region in a transaction would be removed if its fuzzy value was smaller than the threshold. In this process, only the local frequent fuzzy 1-itemsets kept in each transaction were used for mining. The fuzzy regions which were close to but below the threshold would provide no contribution at all to the mining. Thus, some fuzzy regions would not be frequent even the summation of its fuzzy values in the database was larger than or equal to the minimum support. Besides, the expression of fuzzy patterns with more fuzzy regions was straight. The approach did not use any fuzzy operation to combine fuzzy regions together. It made the mined fuzzy rules a little hard to understand. In this paper, we attempt to extend the FP-tree mining process for handling quantitative data from the global values of fuzzy regions. A new fuzzy FP-tree is thus proposed, which is a data structure keeping frequent fuzzy regions. Besides, fuzzy operations are considered in forming itemsets with more than one fuzzy region. For achieving this purpose, the proposed fuzzy FP-tree is a little more complex than the original one and than that proposed by Papadimitriou and Mavroudi (2005). Based on the proposed approach, the frequent itemsets are efficiently expressed and mined out in linguistic terms, which are more natural and understandable for human beings. The remainder of this paper is organized as follows. Related works are reviewed in Section 2. The notation used in the algorithm is explained in Section 3. The proposed fuzzy FP-tree construction algorithm is described in Section 4. An example to illustrate the proposed algorithm is given in Section 5. Experimental results for showing the performance of the proposed algorithm are provided in Section 6. Conclusions are finally given in Section 7.
نتیجه گیری انگلیسی
In this paper, we have proposed the fuzzy FP-tree construction algorithm for processing transaction data with quantitative values and for mining frequent fuzzy itemsets from the transactions. The fuzzy FP-tree structure is used to efficiently and effectively handle the quantitative data. When extending the FP tree to handle fuzzy data, the processing becomes much more complex than the original. For example, the membership value of a k -itemset (k>1)(k>1) in a transaction has to be derived from the membership values of the items contained in the itemset by fuzzy intersection. By the proposed fuzzy FP tree, the function can be easily achieved. The tree structure, however, becomes larger than the original since the order of fuzzy regions in a transaction must be maintained. Besides, the mining process from the tree also becomes complicated. The cost is, however, needed since more detailed knowledge is derived than that in only the binary format. Of course, as an alternative, the fuzzy regions can be converted back into crisp regions by the operation of αα-cut, and then processed in the traditional FP-tree approach. Experimental results also show that the number of fuzzy regions significantly affects the performance of the proposed algorithm. When the minimum support threshold is set lower, the larger number of fuzzy regions will generate less nodes and need less execution time. On the contrary, when the minimum support threshold is set higher, the result is inverse to the above. In this paper, we assume the database is static. In real-world applications, data may be dynamically inserted into a database. In the future, we will attempt to handle the maintenance problem of fuzzy data mining. How to further improve the fuzzy FP tree is another interesting topic.