جنبه های TMiner : نگرانی های ضروری در چارچوب داده کاوی مبتنی بر مولفه TMiner
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|22187||2010||7 صفحه PDF||سفارش دهید||4867 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 37, Issue 9, September 2010, Pages 6675–6681
TMiner (Berzal, Cubero, & Jiménez, 2009) is a component-based data mining framework that has been designed to support the whole KDD process and facilitate the implementation of complex data mining scenarios. This paper shows how aspect-oriented programming techniques support some tasks whose implementation using conventional object-oriented programming would be extremely time-consuming and error-prone. In particular, we have successfully employed aspects in TMiner to evaluate and monitor the I/O performance of alternative data mining techniques. Without having to modify the source code of the system under analysis, aspects provide an unintrusive mechanism to perform this kind of performance analysis. In fact, aspects let us probe a system implementation so that we can identify potential bottlenecks, detect redundant computations, and characterize system behavior + lessons learned during the development of TMiner.
All programming methodologies provide some kind of support for separation of concerns, which entails breaking down a program into distinct parts that overlap in functionality as little as possible. The structured and object-oriented programming paradigms resort to procedures and classes, respectively, to encapsulate concerns into single entities and thus achieve some separation of concerns. However, some concerns defy these forms of encapsulation and lead to tangled, difficult-to-maintain code, since they cut across multiple modules in a program. Aspect-oriented programming overcomes this problem by enabling developers to express these crosscutting concerns separately (Kiczales et al., 1997). In this paper, we employ aspect-oriented software development techniques for solving a common problem programmers must face in the development of complex systems. In particular, we describe how aspects can be woven within a component-based data mining framework in order to support the fine-grained performance evaluation of data mining techniques. Aspects let developers dig into their system at leisure. Since aspects provide an unintrusive way to tuck probes into their system, developers do not have to tweak their underlying system implementation for enabling system monitoring. As keen observers, they can study system performance without inadvertently introducing subtle errors nor degrading actual system performance in a production environment (aspects can easily be removed once the performance evaluation has taken place). Our paper is organized as follows. Section 2 introduces some of the fundamental concepts and terms behind aspect-oriented software development, including a subsection that describes how crosscutting concerns, or aspects, can be specified using the AspectJ extension to the Java programming language. Section 3 describes the component model and the architectural design of the data mining framework we have fitted with aspects to study the implementation of crosscutting concerns in complex systems. Section 4 presents a case study on the evaluation of the I/O performance of some well-known data mining techniques. Finally, Section 5 concludes our paper by summarizing the results of our study.
نتیجه گیری انگلیسی
In this paper, we have described how aspect-oriented programming techniques can be used to provide elegant implementations of crosscutting concerns. While conventional structured and object-oriented techniques would lead to poorly-structured systems, aspect-orientation provides a well-modularized way to specify system-wide concerns in a single place. We have also shown how AspectJ, an aspect-oriented extension for the Java programming language, can be used in real-world applications to provide fine-grained performance evaluation and monitoring capabilities. Moreover, the approach proposed in this paper does not need the underlying source code to be modified. This unintrusive technique avoids the inadvertent insertion of bugs into the system under evaluation. It also frees developers from the burden of introducing scattered code to do their performance evaluation and monitoring work. Finally, we have described how our proposed approach can be employed for evaluating the I/O cost associated to some data mining techniques. In the experiments we have performed using the TMiner component-based data mining framework, we have witnessed how associative classifiers such as ART possess good scalability properties. In fact, the efficient association rule mining algorithms underlying ART make it orders of magnitude more efficient than alternative rule and decision list inducers, whose I/O requirements heavily constrain their use in real-world situations unless sampling is employed. Moreover, we have confirmed that the additional cost required by ART, when compared to decision tree learners such as C4.5, is reasonable if we take into account the desirable properties of the classification models it helps us obtain, thus making of associative classifiers a viable alternative to standard decision tree learners, the most common classifiers in data mining tools nowadays.