This paper provides some practical guidelines for the design of data mining frameworks. It describes the rationale behind some of the key design decisions that guided the design, development, and implementation of the TMiner component-based data mining framework. TMiner is a flexible framework that can be used as a stand-alone tool or integrated into larger business intelligence (BI) solutions. TMiner is a general-purpose component-based system designed to support the whole KDD process into a single framework and thus facilitate the implementation of complex data mining scenarios
Traditional on-line transaction processing systems, also known as OLTP systems, work with relatively small chunks of data at a time, while on-line analytical processing systems, or OLAP systems, require the analysis of huge amounts of data (Chaudhuri & Dayal, 1997). It comes as no surprise that OLAP systems have very specific needs that conventional application frameworks do not properly address.
This fact has led to the development of data mining (Tan et al., 2006 and Han and Kamber, 2006) and data warehousing (Widom, 1995 and Kimball and Ross, 2002), which try to satisfy the expectations of the so-called knowledge workers (executives, managers, and analysts).
This paper describes the rationale behind some key design decisions that led to the development of a component-based data mining framework called TMiner. As we will see, TMiner can be used as a flexible stand-alone data mining tool, but it has also been designed so that it can be easily incorporated into larger business intelligence solutions.
It should be noted that the tools and techniques TMiner collects somewhat overlap with existing Machine Learning algorithm collections, such as Weka (Witten & Frank, 2005). However, TMiner is more that a mere collection of independent algorithms for data mining tasks that can be directly applied on prepared datasets or invoked from your own code.
Some open-source and commercial data mining libraries (Prudsys, 2008 and Rapid-I, 2008) include facilities for their integration into actual enterprise systems. TMiner also provides usage modes specially designed for its tight integration into larger solutions.
TMiner is a general-purpose component-based system designed to support the whole KDD process into a single framework and thus facilitate the implementation of complex data mining scenarios. In this sense, TMiner is designed to be useful in a wide variety of application domains, in sharp contrast to domain-specific data mining systems such as iKDD or SA. While the interactive knowledge discovery and data mining system, iKDD, was designed for particular bioinformatics-related problems (Etienne, Wachmann, & Zhang, 2006), Perttu Laurinen’s Smart Archive, SA, has been proposed for implementing data mining applications using data streams (Laurinen, Tuovinen, & Roning, 2005).
The rest of our paper is organized as follows. Section 2 describes the architectural design of the TMiner framework and its component model. Section 3 describes the facilities TMiner offers for different usage scenarios, from the casual user who wants to perform simple data analysis tasks and the researcher who needs a more thorough experimentation, to the systems integrator who needs to incorporate data mining features into final solutions. Finally, Section 4 concludes our paper with some comments on the current status of TMiner and our expectations for its future.