دانلود مقاله ISI انگلیسی شماره 9154
ترجمه فارسی عنوان مقاله

الگوهای نوظهور جهشی با منفی نشان دادن در پایگاه داده های معامله - طبقه بندی و اکتشاف

عنوان انگلیسی
Jumping emerging patterns with negation in transaction databases – Classification and discovery
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
9154 2007 16 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Information Sciences, Volume 177, Issue 24, 15 December 2007, Pages 5675–5690

ترجمه کلمات کلیدی
پرش الگوی در حال ظهور - منفی نشان دادن - پایگاه داده های متناقض - پایگاه داده توسعه یافته - مجموعه راف - کاهش محلی - پایگاه داده معامله -
کلمات کلیدی انگلیسی
Jumping emerging pattern, Negation, Contradictory database, Extended database, Rough set, Local reduct, Transaction database,
پیش نمایش مقاله
پیش نمایش مقاله  الگوهای نوظهور جهشی با منفی نشان دادن در پایگاه داده های معامله - طبقه بندی و اکتشاف

چکیده انگلیسی

This paper examines jumping emerging patterns with negation (JEPNs), i.e. JEPs that can contain negated items. We analyze the basic relations between these patterns and classical JEPs in transaction databases and local reducts from the rough set theory. JEPNs provide an interesting type of knowledge and can be successfully used for classification purposes. By analogy to JEP-Classifier, we consider negJEP-Classifier and JEPN-Classifier and compare their accuracy. The results are contrasted with changes in rule set complexity. In connection with the problem of JEPN discovery, JEP-Producer and rough set methods are examined.

مقدمه انگلیسی

Patterns play an important role in knowledge discovery. In the field of transaction data analysis, they have been successfully applied to various practical problems, like association mining, classification or clustering. In contrast to many statistical tools or neural networks, they can be easily understood and interpreted by humans. Moreover, their simple structure has led to the recent development of efficient concise representations that help to manage large pattern collections. Positive patterns, that is, specific sets of items supported by transactions of a given dataset, are a very intuitive concept. Our paper considers one of its possible extensions, i.e. patterns with negation, which combines positive and negative aspects of data. Originally, negative relationships were introduced in [22], where a chi-square model was applied in order to estimate independence between two variables. As far as data mining is concerned, the vast majority of publications employ the idea of negation to formulate new interesting association rules. Therefore, many algorithms include variants of frequent itemset mining [11]. Since extending the definition of a pattern results in search space enlargement, special algorithmic strategies should be developed, e.g. more efficient pruning can be achieved by supplementing the support-confidence framework with additional measures of interestingness [28] and [1]. Another approach is to search for rules of a constrained form that often correspond to a specific interpretation. For example, in negative association rules [28] and confined association rules [1], only a complete antecedent or consequent can be negated, whereas in unexpected rules [16] and exceptional rules [12] negative items in antecedents are used to represent exceptions to the regular associations. Some other solutions make use of domain knowledge to formulate valuable rules [21] and [29]. Last but not least, the problem of mining frequent itemsets with negation can be addressed by means of concise data representations [4] and [13]. Our discussion focuses on a specific type of patterns called jumping emerging patterns (JEPs, [6] and [14]). Compared to many other similar propositions, JEPs seem to be a very accurate tool for classification problems. A JEP is an itemset which is supported in one database and absent from others. Let us consider the example in Table 1. Note that the patterns: cef, de are JEPs in class 0. Moreover, the latter is minimal, because neither of its proper subsets is a JEP. For the sake of brevity, we omit brackets and commas in the itemset notation, e.g. the pattern ab should be understood as {a,b}{a,b}.A JEP is an example of positive information. It indicates that a certain pattern is specific to some class. Negative knowledge can be modeled by negated items, which are treated as items that do not co-exist with regular ones in a given transaction. In terms of the formal data mining apparatus, let us represent the fact that a transaction does not contain an item a by the statement that it contains the respective negated item, denoted by View the MathML sourcea¯. In other words, we express negation in terms of existence. Now, let us consider the patterns: View the MathML sourceb¯df¯, View the MathML sourcea¯e. The first one is supported by only one transaction id2id2, while the second one by all the three transactions of class 0: id1id1, id2id2, id3id3. We call these patterns jumping emerging patterns with negation (JEPNs). Note that a JEP is a specific type of a JEPN that contains only positive items, a positive JEPN (posJEPN). Similarly, we can introduce a dual concept, a negative JEPN (negJEPN) that is built only out of negated items, e.g. View the MathML sourcea¯d¯. Both negJEPNs and JEPNs are similar to JEPs in their nature, as we demonstrate that these new types of patterns are still JEPs in specifically modified databases. This observation transfers many important features of JEPs to negJEPNs/JEPNs and allows us to use the same pattern discovery methods. Unfortunately, this task is often much harder than finding JEPs in the original database. In particular, JEPN mining requires analysis of a database with a doubled number of possible items and longer transactions. Thus, this problem has more dimensions, is more time consuming and, for many datasets, this kind of analysis proves unfeasible. As an alternative mining solution, we propose a method based on the notion of a local reduct in a decision table [3]. This concept derives from the rough set theory, which provides one of the most elegant descriptions of relational data. Basic ideas, proposed by Zdzislaw Pawlak in 1982 [17] and [18], triggered intensive research, which has resulted in the creation of many convenient tools for real data analysis, e.g reduct-based classifiers or discretizers [3], [23], [20] and [19]. It is worth mentioning that several relations between the rough set theory and JEPs have already been observed [26] and [25]. Here, we demonstrate a correspondence between JEPNs and local reducts in a binary relation database constructed for a transaction database. Taking advantage of this correspondence, we compare the efficiency of JEPN discovery by means of JEP and local reduct finding methods. In general, in various works that consider the idea of negation in transaction databases, classification issues are rarely raised, and tackled by means of associations [15] and [1]. Therefore, we believe that the role of negation in the field of emerging patterns is worthy of a comprehensive study. Also, the experimental results presented in this paper are a good premise to support the claim that negative knowledge can provide a strongly discriminative class description and should be perceived as a valuable classification approach. The text is organized as follows. Section 2 provides a formal background focusing on emerging patterns and the rough set theory. In Section 3, we demonstrate and prove relations between JEPNs, JEPs and local reducts. These relations are explained by means of brief examples. Then, in Section 4 we demonstrate how the new theorems can address the JEPN discovery problem. Selection and preparation of testing data are discussed in Section 5. A comparison of the efficiency of JEPN mining approaches is presented in Section 6. Section 7 contains classification results, which are contrasted in Section 8 with the complexity growth of respective rule sets. Classification issues are discussed in Section 9. The paper is concluded in Section 10.

نتیجه گیری انگلیسی

In the paper we have studied the role of negative knowledge in classification based on jumping emerging patterns (JEPs). For this purpose, the notions of a JEP with negation (JEPN) and a negative JEPN have been introduced. We have demonstrated that both of these are JEPs in specifically formulated databases, i.e. an extended and a contradictory database, respectively. As a consequence, these new types of patterns inherit all the advantageous features of JEPs, in particular the convexity of a pattern space. In addition to observations strictly related to JEPs, a correspondence between JEPNs and the rough set theory has been examined. We have proved that JEPNs in a transaction database refer to attribute sets from the positive region of a respective binary relational database. Moreover, it has been demonstrated that local reducts can be used directly for finding minimal JEPNs. With relation to the problem of finding JEPNs, methods related to JEPs and rough sets have been tested against real and randomly generated datasets. It appears that for datasets with a small number of items JEP-Producer is the most efficient solution. However, as far as larger itemspaces are concerned, the algorithms based on local reducts remain a good alternative. In order to test the discriminative power of new patterns, we have defined classifiers analogical to JEP-Classifier. Accuracy tests show that negJEPN-Classifier outperforms the solutions based on JEPs and JEPNs in 12 out of 13 cases. These results were accompanied by an insignificant growth in the number of rules. On the other hand, JEPN-Classifier behaves better than JEP-Classifier in only 7 cases and, for 2 datasets it was evidently the worst. Moreover, our tests have shown that the number of rules increases rapidly, when we allow successive numbers of negated items in JEPNs. Due to this fact, for some datasets, it may be unfeasible to build JEPN-Classifier or to use it efficiently. In conclusion, the classification experiments strongly suggest that negative knowledge can be a useful addition to solutions based on positive rules. We believe that our approach can also be easily extended to other types of emerging patterns, like strong JEPs [9] and chi-EPs [7].