داده کاوی کسب و کار - دیدگاه یادگیری ماشینی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|22029||2001||15 صفحه PDF||سفارش دهید||8336 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Information & Management, Volume 39, Issue 3, 20 December 2001, Pages 211–225
The objective of this paper is to inform the information systems (IS) manager and business analyst about the role of machine learning techniques in business data mining. Data mining is a fast growing application area in business. Machine learning techniques are used for data analysis and pattern discovery and thus can play a key role in the development of data mining applications. Understanding the strengths and weaknesses of these techniques in the context of business is useful in selecting an appropriate method for a specific application. The paper, therefore, provides an overview of machine learning techniques and discusses their strengths and weaknesses in the context of mining business data. A survey of data mining applications in business is provided to investigate the use of learning techniques. Rule induction (RI) was found to be most popular, followed by neural networks (NNs) and case-based reasoning (CBR). Most applications were found in financial areas, where prediction of the future was a dominant task category.
Data mining, also known as “knowledge discovery in databases” , is the process of discovering interesting patterns in databases that are useful in decision making. Data mining is a discipline of growing interest and importance, and an application area that can provide significant competitive advantage to an organization by exploiting the potential of large data warehouses. The task of finding patterns in business data is not new. Traditionally, it was the responsibility of business analysts, who generally use statistical techniques. The scope of this activity, however, has recently changed. Widespread use of computers and networking technologies has created large electronic databases that store business transactions. Retailers, like Wal-Mart Stores, capture millions of sales transactions through their point-of-sale terminals. Transactions can be analyzed to identify buying patterns of individual customers as well as customer groups, and sales patterns of different stores. Intense competition is forcing companies to identify innovative ways to capture and enhance market shares while reducing cost. A better appreciation of the buying behavior of customers can enhance the effectiveness of target marketing practices. Data warehousing technology has enabled companies to organize and store large volumes of business data in a form that can be analyzed and a maturing of the “artificial intelligence” field has created a set of techniques of “machine learning” that are useful in automating tedious but crucial activities of discovering patterns in databases. These factors have changed the way that business data are analyzed and given rise to data mining, which integrates machine learning, statistical analysis and visualization techniques, with the intuition and knowledge of the business analyst, to discover meaningful and interesting patterns in business data. Data mining is a complex process involving multiple iterative steps. Fig. 1 gives an overview of this process. The first step is the selection of data for analysis. Normally, historical data is used. The data set may be retrieved from a single source, such as a data warehouse, or may be extracted from several operational databases. The selected data set then undergoes cleaning and preprocessing. Lack of consistency across databases creates serious problem when data is extracted from multiple databases. The cleaning operation removes discrepancies and inconsistencies. Some mining techniques require data to be preprocessed to improve its quality. Examples include transformation of data from one scale to another, identification of predictive attributes in the data set, and reduction of the dimension of the data set through recomposition.The data set is analyzed next to identify patterns, i.e. models that represent relationships among data. The model is then validated with new data sets to ensure its generalizability. It should be possible to translate the model into actionable business plans that are likely to help the organization achieve its goals. A model or pattern that satisfies these conditions becomes business knowledge. The steps in the mining process are performed iteratively until meaningful business knowledge is extracted. A number of algorithms have been developed in domains, such as machine learning, statistics, and visualization, to identify patterns in data. Of these, statistical modeling approaches are the oldest. The data set must conform to rigid distribution criteria to employ statistical modeling methods. Pattern discovery algorithms based on machine learning techniques, however, impose fewer restrictions and produce patterns that are easy to understand. They are, therefore, finding wide popularity in data mining applications. Each technique has its own strengths and weaknesses. Understanding these in the context of business data mining is very useful in selecting an appropriate technique for a specific application. The objective of this paper is to inform the information systems (IS) manager and the business analyst about the role of machine learning techniques in business data mining.
نتیجه گیری انگلیسی
Table 3 shows the distribution of applications by machine learning techniques. • RI has many interesting characteristics that make it an appropriate technique for developing data mining applications in business. It is robust in processing large data sets with high predictive accuracy and is well suited for classification and prediction tasks. The results are easy to explain. It has been extensively studied in the literature and is supported by tools that make it easier to implement applications. • NN has poorer explanation capability, is less efficient in processing large data sets, and requires the user to possess substantial tool knowledge to set up and operate the system. This may explain why NN is less popular than RI in business data mining. • CBR is highly useful in a domain that has a large number of examples but may suffer from the problem of incomplete or noisy data. It is still evolving as a machine learning technique. Due to its ability to work with noisy and missing data, its use in business data mining is likely to increase as the technology matures. • GA and ILP are relatively new machine learning techniques that require extensive tool knowledge to set up and operate. GA works well with noisy data and is easy to integrate with other systems. Although ILP has several weaknesses, one of its strengths is its powerful modeling language that can model complex relationships.Even though visualization is not a machine learning technique, we have listed it as a separate category because of its widespread use. Visualization plays a useful role by enabling the analyst to scan the raw data to identify patterns, detect outliers, and develop hypotheses, which are subsequently verified. Visualization also facilitates interpretation of results. While most of the applications surveyed use a single technique, some applications complementarily combine more than one technique. A notable example in this direction uses NN, GA, and RI to mine classification rules from a database . GA is used to identify the most discriminating features of the data set. These features are used to train an NN. Rules are then extracted from the trained NN. Incorrect rules are revised using an explanation-based algorithm to improve the predictive accuracy of the system. This approach combines the robustness and search ability of GA with high predictive accuracy of NN and interpretability of rules to create a data mining system that outperforms systems based on a single technique. While developing such hybrid systems seem to be beneficial, more studies are required to understand the scope and limitations of these systems and to provide guidelines for developing such systems. Table 4 shows the distribution of data mining applications by functional area. Finance and marketing lead other areas in application count. Two characteristics of these areas may explain the widespread use of data mining. First, computerization of transaction processing activities has created large databases ready to be mined. Second, these areas offer high potential payoff for data mining applications. The latter is an important consideration in developing data mining applications because of the huge investments required for such applications. Predicting the future is a dominant application category in finance. Whether it involves predicting the ability of a loan applicant to pay back the loan or the change in the stock price, any such information that may reduce the uncertainty in a financial decision is likely to result in substantial payoff. Marketing applications mostly target the customer with a view to understanding customer needs. This information will help develop products and services that better match customer expectations.Table 5 shows the distribution of applications by problem category and functional area. The counts in this table do not match with those in Table 3 and Table 4 due to double counting of some applications that have addressed two problem categories. Classification and prediction are dominant problem categories accounting for a total of 62% of all applications. A large number of financial applications involve predicting the future. Thus, an important use of data mining applications in finance is to reduce the uncertainty in financial transactions. Association is a major problem category in marketing. Market basket analysis and product performance analysis are examples of applications in this category. These applications provide useful information for designing sales and marketing strategies. Web Mining applications seek to understand browsing behavior of net surfers. This information is useful in enhancing Web page design. Detection is a dominant application category in telecommunications. Fraudulent use of telecommunication products is a major cause of loss of revenues to telecom service providers. Detecting fraudulent users can minimize these losses.While RI may continue to be the dominant technique in business data mining for reasons discussed earlier, other techniques, especially ILP and GA, are expected to find wider use. ILP with its powerful modeling language and GA with its robust search technique offer some unique features that are valuable in many data mining applications. Since these techniques are difficult to implement, development of user friendly tools is necessary to enhance their use. We also expect to see a growth in hybrid applications that complementarily use more than one technique. The growth of Internet commerce is likely to stimulate development of Web Mining applications. Data mining is a fast growing application area in business organizations. IS managers are often faced with data mining tasks but are overwhelmed with a plethora of techniques and toolkits. This paper aims at helping them understand the role of machine learning techniques in mining business data. We discussed the strengths and weaknesses of each technique in terms of the data and operating characteristics. This knowledge is useful in selecting an appropriate technique(s) for a specific task. The survey of applications presented in this paper provides additional insight into the use of machine learning in business data mining. Interested readers may further explore applications in a specific area through the references listed in this paper. Understanding the scope and limitations of current data mining applications can be very useful in developing new applications.