ترجمه فارسی عنوان مقاله

معماری خدمت گرا به منظور ارائه خدمات داده کاوی برای مسئولان غیر متخصص داده کاوی

عنوان انگلیسی

A service oriented architecture to provide data mining services for non-expert data miners

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
22289	2013	13 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Decision Support Systems, Volume 55, Issue 1, April 2013, Pages 399–411

ترجمه کلمات کلیدی

- خدمات تجزیه و تحلیلبه عنوان یک سرویس - پایگاه داده کشف دانش - داده کاوی - سرویس گرا معماری - خدمات وب

کلمات کلیدی انگلیسی

Analytics service, BI-as-a-Service, Knowledge discovery database, Data mining, Service-oriented architecture, Web Services

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

In today's competitive market, companies need to use discovery knowledge techniques to make better, more informed decisions. But these techniques are out of the reach of most users as the knowledge discovery process requires an incredible amount of expertise. Additionally, business intelligence vendors are moving their systems to the cloud in order to provide services which offer companies cost-savings, better performance and faster access to new applications. This work joins both facets. It describes a data mining service addressed to non-expert data miners which can be delivered as Software-as-a-Service. Its main advantage is that by simply indicating where the data file is, the service itself is able to perform all the process.

مقدمه انگلیسی

In a market as competitive and global as today's, currently affected by a deep economic crisis, information is one of the main managerial assets since its analysis helps in effective steering, as De Leeuw [35] pointed out 28 years ago. Regardless of the size of the company, the need for having an accurate and reliable knowledge of what is affecting its business and for discovering new useful information hidden in the data for correct decision making has meant that since the end of nineties, business intelligence (BI) tools have been used more and more although the sector growth has not been so high in the last few years as a consequence of the economic crisis [72]. Business intelligence tools, as is well-known, encompass a wide range of techniques and technologies which are used to gather, provide access to and analyze data from the operational systems of the organization and other external sources (for instance surveys, information from competitors or data from the web, among others) with the aim of offering decision makers a more comprehensive knowledge of the factors affecting their business and, in this way, help them to take more accurate and effective managerial actions. Among the different elements which make up a BI environment [33], we consider four of them, the data warehouse (DW), the On-Line Analytical Processing (OLAP) technology, the reporting tools and the data mining techniques to be the most essential. The DW is the integrated repository of the strategic information of the organization which generally includes measurements, metrics and facts from the different business processes of the company (known as key performance indicators — KPI). These measurements are defined according to the different users' perspectives. The OLAP technology meets managers' and business analysts' needs to quickly search and explore accurate, up-to-date, complete information from the DW, this information being detailed as well as aggregated. The reporting tools and, in particular, dashboards and scorecards aim to help analysts to monitor and analyze the status of their KPI and drill into detailed data to identify the root causes of problems and intervene while there is still time. Lastly, the data mining techniques facilitate the exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and models which can be used directly in decision making (for instance, a model for preventing credit risk). Nowadays the majority of large companies and corporations have to a greater or lesser extent a DW and they use reporting and OLAP tools to extract and analyze the information which allows them to position themselves strategically in the market. However, although there are areas where data mining techniques are being used more and more, such as business [48], marketing [61], education [16], banking [46], health systems [78], and so on [52], their use is still not generalized. This is mainly due to the fact that data mining projects need highly qualified professionals (expert data miners) to achieve, in reasonable time, useful results for business. According to Fayyad et al. [20], these results must be non-trivial, valid, novel, potentially useful, and ultimately understandable patterns to be able to be used in decision making. One of the reasons for which expert data miners are required is that the knowledge discovery in databases (KDD) process involves multiple stages [20], and regretfully, in each one, there is a large number of decisions that have to be taken with little or no formal guidance. The lack of a theoretical framework that unifies different data mining tasks [77] explains why the KDD process is said to be as much an “art” as it is “science” [45] and [70]. Except for some specific cases, business intelligence needs can be grouped in domain specific solutions as for example retail banking [27], insurance risk assessment [63], discovering web access patterns [34] and [81], selective marketing campaigns [8] and [70], acquiring and retaining customers [26] and [32], and so on. Since the information which companies have in their transactional systems as well as the questions they want answered have a lot in common, generic data mining models can be designed in order to satisfy the needs of all of them. One easy way to define these models is by means of templates, which specify the data set to be processed, the kind of result which is required (for instance a segmentation, a rule set or a predictive model), the pre-processing tasks to be carried out and the mining algorithms to be used. These templates would be defined by a data miner, expert in the business domain, and exploited by all the users who access the service proposed in this work. As far as we know, there is no service in the cloud which allows an end-user to extract patterns and models by simply sending his data file without having to carry out the tedious job of selecting attributes, pre-processing and setting data mining algorithms. A service like this does not only offer non-expert data miners a tool for analysis but also facilitates the work of the expert data miners who can use it to obtain initial patterns easily and quickly. In short, our objective in this paper is to describe a software architecture which meets the necessity of non-expert data miners to extract useful and novel knowledge using data mining techniques in order to obtain patterns which can be used in their business decision making process. Our proposal follows a service-oriented architecture with the aim of being easily configured and hosted in the web and can be deployed as an Analytic Software-as-a-Service. Furthermore, a service-oriented architecture implemented by means of Web Services facilitates its extension with new functionalities (services), developed by ourselves or by third-parties (through an orchestration of services). Another additional advantage that SOA offers is its design, based on layers, which allows the improvement of certain parts of the system without affecting the rest. This paper is organized as follows. First, we write a preliminary section in which our interpretation of some concepts and terms used in the paper are explained. Next, we review the context of BI-as-a-Service and enumerate some currently available on-demand tools. Likewise, we relate other works published with a specific focus on the knowledge discovery process and discuss these in relation to our proposal. After that, we describe the architecture of our service and discuss some details about its implementation. In Section 4, we present an application which uses the proposed data mining service, called E-learning Web Miner, which allows virtual course instructors to extract knowledge from the clickstream stored in the e-learning platform logs. And, finally, we close by summarizing the contents of this chapter and discussing our future work.

نتیجه گیری انگلیسی

The delivery of data mining as a service is an emergent necessity, above all for small to medium range organizations which are the most constrained by the high cost of data mining software and the availability of expert data miners to use this software. Until now, the tools deployed as Bi-a-as-Service in the cloud are conceived more for license cost-saving than as a product which can be used directly by end-users without data mining knowledge. To respond to this necessity, this paper describes the architecture of a data mining service for non-expert data miners which can be delivered as SaaS. Its main characteristic is that it is based on the use of templates that answer certain previously-defined questions. These templates gather the tasks of the KDD process to be carried out on the data set which is sent by the end-user. The templates are defined by the service administrator. This service is offered as a Web Service which makes it easily accessible from any client application. Furthermore, its extension with other data mining algorithms and visualization tools developed ad-hoc or consumed from a provider in the Internet can be effortlessly incorporated since it is designed following a service-oriented architecture. This paper also presents ElWM, a web application which uses the data mining service configured for an educational context; in particular, it helps instructors involved in virtual teaching to discover their students' profile and their behavior in the course. The prototype of this web application has been successfully tested in two virtual courses taught in the University of Cantabria. In the instructors' opinions, the tool is very easy to use and the information which it returns is very useful to better understand what is happening in the course and take actions as soon as anomalous behaviors are detected. Currently, our research is focused on the specification of new templates to be incorporated in the service and consequently, wrapping other data mining algorithms and visualization techniques. The security is another important aspect we are considering. Among other tasks, we have in mind adding encryption techniques in communication messages and data files. Another challenging task is to adapt our architecture to the Open Grid Services Architecture (OGSA) [47] which represents an evolution toward a Grid system architecture based on Web Service concepts and technologies. Lastly, we will study and choose the most suitable cloud environment in which to deploy our solution, for example, Amazon.