تجزیه و تحلیل مقایسه ای از روش های داده کاوی برای پیش بینی ورشکستگی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|22248||2012||10 صفحه PDF||سفارش دهید||6390 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : http://dx.doi.org/10.1016/j.dss.2011.10.007, Volume 52, Issue 2, January 2012, Pages 464–473
A great deal of research has been devoted to prediction of bankruptcy, to include application of data mining. Neural networks, support vector machines, and other algorithms often fit data well, but because of lack of comprehensibility, they are considered black box technologies. Conversely, decision trees are more comprehensible by human users. However, sometimes far too many rules result in another form of incomprehensibility. The number of rules obtained from decision tree algorithms can be controlled to some degree through setting different minimum support levels. This study applies a variety of data mining tools to bankruptcy data, with the purpose of comparing accuracy and number of rules. For this data, decision trees were found to be relatively more accurate compared to neural networks and support vector machines, but there were more rule nodes than desired. Adjustment of minimum support yielded more tractable rule sets.
Bankruptcy prediction has been a focus of study in business analytics because of the importance of accurate and timely strategic business decisions. Even though the accuracy of the prediction model is a very important criterion, understandability and transportability of the model are also important. The accurate prediction of bankruptcy has been a critical issue to shareholders, creditors, policy makers, and business managers. There is a wealth of research that has been applied to this field , , , ,  and , both in finance and in other fields . Among the thousands of refereed journal articles, many recent studies have applied neural networks (NNs) , , , , , , , , , , ,  and . Another popular approach is decision trees (DTs) , ,  and . Support vector machines (SVMs) have been proposed for smaller datasets with highly nonlinear relationships , , ,  and . The vast majority of studies in this domain have focused on NNs, and how good they are compared to their statistical counterpart (i.e., logistic regression) at fitting data (fidelity ). However, neural network models are black boxes  and , lacking transparency (seeing what the model is doing, or comprehensibility) and transportability (being able to easily deploy the model into a decision support system for new cases). We argue that decision trees (DTs) can be as accurate, and provide transparency and transportability that NNs are often criticized for. The paper is organized as follows. Section 2 reviews previous research on bankruptcy prediction based on data mining methods. Section 3 describes data mining methodologies. Section 4 discusses the data collected and Section 5 presents data analysis and prediction model building methods as well as the results obtained from different data mining techniques. Section 6 gives our conclusions.
نتیجه گیری انگلیسی
Any particular set of data will have different relative fits from different data mining models. That is why it is conventional to apply logistic regression, neural networks, and decision trees to data. Neural network models often provide very good fit with a particular data set, but they are not transparent nor easily transportable. Decision tree models are expressed in easily understood terms. A common problem with decision trees is that models generate too many rules. This can be controlled by increasing the minimum support required for a rule. Our study demonstrated this point. For this data, the overall best fit (0.948 average accuracy) was obtained with a WEKA J48 decision tree model with minimum support of 2. However, that involved 46 leaves to the decision tree. Even setting the minimum support to 9 yielded 28 leaves to the tree involving 11 attributes (average accuracy dropping to 0.921). While the set of rules is transportable and transparent, it is bulky and complex. A decision tree obtained from a WEKA CART model with minimum support of 9 yielded a model we would argue was preferable, involving 12 leaves to the tree and 7 attributes. There was degradation in average testing accuracy (dropping to 0.897). The particular choice would depend upon user preferences. Our point is that there is a tradeoff between average accuracy and decision tree size that can be controlled through the minimum support parameter.