دانلود مقاله ISI انگلیسی شماره 20071
ترجمه فارسی عنوان مقاله

اکتشاف انبار داده های جامع با استخراج ارتباط قواعد واجد شرایط

عنوان انگلیسی
Comprehensive data warehouse exploration with qualified association-rule mining
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
20071 2006 20 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Decision Support Systems, Volume 42, Issue 2, November 2006, Pages 859–878

ترجمه کلمات کلیدی
انبار داده ها - داده کاوی - قوانین انجمن - مدل بعدی - سیستم های پایگاه داده - کشف دانش -
کلمات کلیدی انگلیسی
Data warehouse, Data mining, Association rules, Dimensional model, Database systems, Knowledge discovery,
پیش نمایش مقاله
پیش نمایش مقاله  اکتشاف انبار داده های جامع با استخراج ارتباط قواعد واجد شرایط

چکیده انگلیسی

Data warehouses store data that explicitly and implicitly reflect customer patterns and trends, financial and business practices, strategies, know-how, and other valuable managerial information. In this paper, we suggest a novel way of acquiring more knowledge from corporate data warehouses. Association-rule mining, which captures co-occurrence patterns within data, has attracted considerable efforts from data warehousing researchers and practitioners alike. In this paper, we present a new data-mining method called qualified association rules. Qualified association rules capture correlations across the entire data warehouse, not just over an extracted and transformed portion of the data that is required when a standard data-mining tool is used.

مقدمه انگلیسی

Data mining is defined as a process whose objective is to identify valid, novel, potentially useful and understandable correlations and patterns in existing data, using a broad spectrum of formalisms and techniques [9] and [23]. Mining transactional (operational) databases, containing data related to current day-to-day organizational activities could be of limited use in certain situations. However, the most appropriate and fertile source of data for meaningful and effective data mining is the corporate data warehouse, which contains all the information from the operational data sources that has analytical value. This information is integrated from multiple operational (and external) sources, it usually reflects substantially longer history than the data in operational sources, and it is structured specifically for analytical purposes. The data stored in the data warehouse captures many different aspects of the business process across various functional areas such as manufacturing, distribution, sales, and marketing. This data explicitly and implicitly reflects customer patterns and trends, business practices, organizational strategies, financial conditions, know-how, and other knowledge of potentially great value to the organization. Unfortunately, many organizations often underutilize their already constructed data warehouses [12] and [13]. While some information and facts can be gleaned from the data warehouse directly, through the utilization of standard on-line analytical processing (OLAP), much more remains hidden as implicit patterns and trends. The standard OLAP tools have been performing well their primary reporting function where the criteria for aggregating and presenting data are specified explicitly and ahead of time. However, it is the discovery of information based on implicit and previously unknown patterns that often yields important insights into the business and its customers, and may lead to unlocking hidden potential of already collected information. Such discoveries require utilization of data mining methods. One of the most important and successful data mining methods for finding new patterns and correlations is association-rule mining. Typically, if an organization wants to employ association-rule mining on their data warehouse data, it has to use a separate data-mining tool. Before the analysis is to be performed, the data must be retrieved from the database repository that stores the data warehouse, transformed to fit the requirements of the data-mining tool, and then stored into a separate repository. This is often a cumbersome and time-consuming process. In this paper we describe a direct approach to association-rule data mining within data warehouses that utilizes the query processing power of the data warehouse itself without using a separate data mining tool. In addition, our new approach is designed to answer a variety of questions based on the entire set of data stored in the data warehouse, in contrast to the regular association-rule methods which are more suited for mining selected portions of the data warehouse. As we will show, the answers facilitated by our approach have a potential to greatly improve the insight and the actionability of the discovered knowledge. This paper is organized as follows: in Section 2 we describe the concept of association-rule data mining and give an overview of the current limitations of association-rule data mining practices for data warehouses. Section 3 is the focal point of the paper. In it we first introduce and define the concept of qualified association rules. In 3.1 we discuss how qualified association rules broaden the scope and actionability of the discovered knowledge. In 3.2 we describe why existing methods cannot be feasibly used to find qualified association rules. In 3.3, 3.4 and 3.5 we offer details of our own method for finding qualified association rules. In Section 4 we describe an illustrative experimental performance study of mining real world data that uses the new method we introduced. And finally, in Section 5 we offer conclusions.

نتیجه گیری انگلیسی

In this paper, we presented a new data-mining framework, called qualified association rules, that is tightly integrated with database technology. We showed how qualified association rules can enable organizations to find new information within their data warehouses relatively easily, utilizing their existing technology. We were motivated by the observation that existing association-rule based approaches are capable of effectively mining data warehouse fact tables in conjunction with the item-related dimension only. We introduced qualified association rules as a method for mining fact tables as they relate to the attributes from both item and non-item dimensions. Standard association rules find coarser granularity correlations among items while qualified rules discover finer patterns. Both types of rules can provide valuable insight for the organization that owns the data. Combined information, extrapolated by examining both standard and qualified association rules, is much more likely to truly reveal the nature of the data stored in the corporate data warehouse and the knowledge captured in it. The existing methods are suited to provide only partial information (standard rules) from data warehouses and, in most cases, require that the data must be retrieved from the database repository and examined by using separate software. Our method provides a more complete view of the data (both standard and qualified rules), while allowing the data to remain in the data warehouse and using the processing power of the database engine itself. State-of-the-art RDBMS query optimizers cannot handle mining qualified association rules directly. Our approach, using an external optimizer, leverages their strength and works around their weaknesses, thus making the mining processing effective and efficient. Besides integrated analytical data collections, such as data warehouses or data marts, many other data repositories are organized in a dimensional-like fashion. For example, it is very common for regular operational databases and files to include tables that contain transactional records composed of a transaction identifier, quantifying data, and foreign keys to other tables involved in the transaction. Any data that is organized and stored in this way can be mined efficiently with qualified association rules, adding a new and valuable insight, as we argued in this paper. We have conducted experiments that implement our approach on real-life data and the results support the viability of our integration approach as well as the appropriateness of qualified association rules. Our future work will include further performance studies with additional data sets, using different hardware platforms and various types of indexes.