Data mining, an efficient method of business intelligence, is a process to extract knowledge from large scale data. As the augment of the size of enterprise and the data, data mining as a way to make use of the data become more and more necessary. But now most of the literatures only focus on the algorithm itself. Few literatures research what qualification to fulfill before the decision doing data mining from the perspective of the company manager. This paper discusses the factors affect the data mining project. Based on the Bayesian risk, we build a model taking the risk attitude of the top executive in account to help them make decision whether to do data mining or not.
In recent years, data mining has been very well researched and a number of algorithms have been proposed (Hope and Korb, 2004, Wu, 1999 and Ying, 2005) In order to prove the effectiveness of an algorithm, researchers test their algorithms in terms of accuracy, time cost and space cost, support, confidence, and lift are also measures of the interestingness of the rules or patterns in the databases (Wang, Strong, & Guarascio, 1994). For example, predictive accuracy is usually used to measure the effectiveness of classification learners, accepting a machine learner as superior to another if its predictive accuracy passes a statistical significance test (Hope & Korb, 2004).
However, researchers seldom consider the algorithm from the perspective of the company. Although some papers discuss the business issue, the aim is to prove the effectiveness of the algorithm (Strobel & Hrycej, 2006). Based on the limited resource and data quality, should my company use data mining techniques on our data? What could I get if I launch a data mining project? The current researchers cannot answer these questions.
Data mining is an application-driven technique(Chen and Liu, 2005 and Sim, 2003). It has been widely used in many applications, from tracking criminals to brokering information for supermarkets, from developing community knowledge for a business to cross-selling, detecting the customer churn. Some applications include marketing, financial investment, fraud detection, manufacturing and production, and network management. Data mining is also useful for sky survey cataloging, mapping the datasets of Venus, biosequence databases, and Geosciences systems (Sim, 2003).
Although data mining are getting more widely used, the present research on data mining did not pay adequate attention to our “God”, people who will use the algorithms. Research papers seldom help managers make decision whether to use data mining or not taking the risk, the finance condition, the effect brought to the company after the application of data mining, the data quality to account. In this paper, we try to build a mechanism to evaluate if a company is quantified to launch a data mining project. A score is used to measure whether the company should do data mining or not based on Bayesian risk.
The remaining of this paper is organized as follows. Section is the review on the application of data mining, the factors affecting data mining and the Bayesian analysis. Section 3 introduces how data quality affects data mining. Section 4 describes how to evaluate human factors and finance factors which are important to the success of a data mining project. Section 5 discusses importance of the support of the top executives. Bayesian Risk is presented in Section 6, followed by a case study in Section 7. Section 8 conclusions this paper and points out our further work.
As the development of data mining, many companies are now in the stage of hesitating if it is good to use data mining analysis in their business decisions. In this paper, we discussed the factors which must be evaluated before the top manager of the company makes decision whether to starting to do data mining in a company. We proposed four important factors, including data quality, human, finance budget, and support of the executives After the primary evaluation of the condition, we get the observation value X. Based on a deep research or the observation of the mentioned factors; we could compute the Posteriori Probability. With the prior and Posteriori Probability, we applied the Bayesian to get the model of decision criteria with the lowest Bayesian risk. With the model and the evaluation, the manager can decide whether his company or organization is good to use data mining analysis in their decisions or not.