قابلیت عمل بینابینی سیستم داده کاوی وب آگاه
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|22030||2002||12 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 22, Issue 2, February 2002, Pages 135–146
The development of web-aware data mining systems has received a great deal of attention in recent years. It plays a key enabling role for competitive businesses in the E-commerce era. One of the challenges in developing web-aware data mining systems is to integrate and coordinate existing data mining applications in a seamless manner so that cost-effective systems can be developed without the need of costly proprietary products. In this paper we present an approach for developing an interoperable web-aware data mining system to achieve this purpose. This approach applies Remote Method Invocation and high level code wrapper of Java distributed object computing to address the issues of interoperability in heterogeneous environments, which includes programming language, platform, and visual object model. The effectiveness of the proposed system is demonstrated through the integration and enhancement of the two well-known standalone data mining tools, SOM_PAK and Nenet, and runs with the iris data and air pollution data.
The study of data mining in discovering patterns, trends, and relationships from large sets of data has attracted great interest from both academia and industry, and is expected to expand rapidly (Berson et al., 1999 and Li and Pan, 2000). Data mining has been regarded as a key enabling technology to move forward in uncovering business intelligence in organizations which is essential in supporting diverse services. These services range from customer segmentation to credit card fraud detection, to response analysis, to database marketing, to web mining, and to many other areas. Depending on the nature of services needed many different data mining systems have been developed with a variety of methodologies including neural networks, decision trees, rule induction, evolutionary computation, association rules, and sequential analysis (Berson et al., 1999 and Thuraisingham, 2000). On the market there are a number of off-the-shelf system developments which provides various data mining functionality: classification, clustering, and prediction. The most well-known ones are IBM's Intelligent Miner, SAS's Enterprise Miner, Oracle's Darwin, SGI's MinSet, etc. These systems do exhibit their excellence in their respective specialty; however the following two concerns may have inhibited the wide application of these systems. The first is that they are cost-expensive and resource-intensive in system platforms, and hence are generally outside the affordability of most organizations. The second is the fact that these systems develop their own GUI with different algorithms and methodologies which may require well-trained staff for operation. Other than these proprietary products there are a large variety of well-developed shareware, freeware, or even demonstration version of data mining tools, which were mostly developed by members of the academic community, and can be accessed freely from the Internet; see Kdnuggets (1998) for a detailed list. Each of these systems may have its own specific features in problem solving capabilities and working environments; some are good in easy operation and were developed as standalone applications in the PC environment; others are good in functionality and computation efficiency and were developed in the Unix environment. For organizations that could not afford the expensive proprietary products one cost-effective solution for developing a data mining system is to reuse certain specific features of these freely available tools. Thus there is a great need in developing an approach for integrating these available tools. The most straightforward approach in integrating systems of different platforms is to adapt them into a web environment; however such a solution does not come without difficulties. The most difficult aspect is the fact that most of these free tools are either standalone applications or even considered as legacy systems which cannot be easily converted into to a web environment. Of course one can always re-engineer the existing systems. However such an undertaking is a non-trivial task because in addition to the usual hustles source codes could be unavailable, several utilities may be needed and may be implemented in different programming languages, and different platforms may be involved (Brodie and Stonebraker, 1995, Law et al., 1998, Resnick, 1998, Saleh et al., 1999 and Umar, 1997). These problems may be overcome if one can develop a web-aware interoperable data mining framework which is capable of transparently functioning in a heterogeneous and distributed environment, and allows each component system to work in its original environment. Recent development of interoperable architecture has paved a way towards system integration under such considerations (Potter et al., 2000 and Wegner, 1996). An interoperable architecture allows two or more software components to cooperate in a seamless manner despite of heterogeneities in implementation of languages, service interfaces, and deployment platforms (Wegner, 1996). In this study we apply the methodology of Java distributed object computing to propose an interoperable data mining framework and develop a system called iSOM which is based on Kohonen's (1997) self-organization map (SOM) neural network and is capable of integrating two ‘official’ standalone SOM utilities, SOM_PAK and Nenet in different platforms. The former is a computationally powerful data mining tool in both the Microsoft Windows or UNIX environment whereas the latter works in the Windows environment and provides the penetrating view of data that are inherently high dimensional. This system addresses the three major issues in the system development: language interoperability, platform interoperability, and object model interoperability. This remaining paper is organized as follows. Section 2 reviews issues and models of developing interoperable systems. It elaborates the methodology of Java distributed object computing in dealing with the interoperability issues. Section 3 gives a brief review of the SOM network, approaches for identifying clusters in a trained SOM network, and two famous SOM tools. Section 4 provides a framework of web-aware interoperable data mining systems. The architecture and system design of the proposed iSOM system are described in Section 5. In Section 6 the experiment results on the Iris data and air pollution data are discussed. Section 7 concludes the paper and gives some future direction.
نتیجه گیری انگلیسی
We developed an interoperable data mining system iSOM in a heterogeneous environment. The system integrates and extends two existing standalone tools SOM_PAK and Nenet. The design philosophy is on the basis of effectiveness in cost, reusability in existing components, and rapid prototyping in system development. We applied the Java distributed object computing model to deal with issues of interoperability in heterogeneous languages, platforms, and visual object models. The resulting system highlights the computational efficiency due to SOM_PAK on UNIX and vivid visualization attributed to Nenet on Windows. It also rejuvenates the original tools with the functionality of Web-aware, concurrent processing, and load balancing. In addition the iSOM system is built upon a two-level SOM_PAK network to allow one to identify the resulting clusters. The experiments conducted on mining the iris data and real-world air pollution data validate its usefulness. The overall benefit can be quite significant. The system may evolve with time by simply incorporating new components or adaptively modifying the current components. Organizations may benefit from the mobile decision making environment to improve their service efficiency. For the future study more functions of on-line interactive analysis in Nenet should be provided. This needs an efficient mechanism for supporting remote distributed invocation. The other direction is to prompt the system components to be mobile intelligent agents so they can cooperate each other autonomously and collaboratively to achieve a data mining task.