ترجمه فارسی عنوان مقاله

دقت متغیر نظریه مجموعه راف و تشخیص داده ها : کاربرد برای پیش بینی شکست شرکت های بزرگ

عنوان انگلیسی

Variable precision rough set theory and data discretisation : an application to corporate failure prediction

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
29479	2001	16 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Omega, Volume 29, Issue 6, December 2001, Pages 561–576

ترجمه کلمات کلیدی

داده کاوی - پیش بینی شکست - نظریه مجموعه راف - دقت متغیر نظریه مجموعه دقیق -

کلمات کلیدی انگلیسی

Data mining, Failure prediction, FUSINTER data discretisation, Rough set theory, Variable precision rough set theory,

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

Since the seminal work of Pawlak (International Journal of Information and Computer Science, 11 (1982) 341–356) rough set theory (RST) has evolved into a rule-based decision-making technique. To date, however, relatively little empirical research has been conducted on the efficacy of the rough set approach in the context of business and finance applications. This paper extends previous research by employing a development of RST, namely the variable precision rough sets (VPRS) model, in an experiment to predict between failed and non-failed UK companies. It also utilizes the FUSINTER discretisation method which neglates the influence of an ‘expert’ opinion. The results of the VPRS analysis are compared to those generated by the classical logit and multivariate discriminant analysis, together with more closely related non-parametric decision tree methods. It is concluded that VPRS is a promising addition to existing methods in that it is a practical tool, which generates explicit probabilistic rules from a given information system, with the rules offering the decision maker informative insights into classification problems.

مقدمه انگلیسی

Since the nascence of computerisation, together with the evolution of Artificial Intelligence (AI), there has been an explosion in the application of advanced decision-making techniques to solving business problems [1], [2], [3], [4] and [5]. Following the pioneering study of Altman [6], who used multivariate discriminant analysis (MDA) to differentiate between failed and non-failed US firms, a large body of research has focused on corporate failure prediction (see [7], [8], [9] and [10] for literature reviews). The prediction of corporate failure continues to be viewed as a matter of considerable interest to both academics and practitioners (including credit and investment analysts), and has obvious importance for the stakeholders (investors, creditors, employees, managers) of a firm. This is evidenced by the recent application of neural networks (NNs), recursive partitioning algorithm (RPA) and case based reasoning to this issue [11], [12], [13], [14] and [15]. A key advantage of these contemporary methods over their traditional counterparts (such as MDA and logit analysis) is that they do not require pre-specification of a functional form, nor the adoption of restrictive assumptions concerning the distributions of model variables and errors [12], [16] and [10]. More recently, a further non-parametric technique, rough set theory (RST), which has its foundations in mathematical set theory, has been applied to decision problems [17] and [18]. RST was originated by Pawlak [19] and has been described as ‘a new mathematical tool to deal with vagueness and uncertainty. This approach seems to be of fundamental importance to AI and cognitive sciences, especially in the areas of machine learning, knowledge acquisition, decision analysis, knowledge discovery from databases, expert systems, decision support systems, inductive reasoning and pattern recognition ... One of the advantages of RST is that it does not need preliminary or additional information about data, such as probability distributions in statistics, basic probability assignment in the Dempster Shafer theory of evidence, or grade of membership of the value of possibility in fuzzy set theory’ [20, p. 89]. RST incorporates the use of indiscernibility (equivalence) relations to approximate sets of objects by upper and lower set approximations and, as noted by Slowinski and Zopounidis [21, p. 79], ‘it is a formal framework for discovering deterministic and non-deterministic rules from a given representation of knowledge ... [it] ... assumes knowledge representation in a decision table form which is a special case of an information system’. Initial RST applications focused on medical diagnosis, drug research and process control [22] and [23], but more recently it has been extended to cover credit fraud detection, stock market rule-generation, market research, climate change and the development of expert systems for the NASA space centre [24], [25] and [20]. Slowinski and Zopounidis [21] also investigated the use of RST to assess the risk of a Greek bank's clients (firms) in terms of granting finance. Although they did not examine the predictive accuracy of the RST rules, they did conclude [21, p. 39] that (based on financial ratios and other firm-specific variables), RST ‘is a useful tool for discovering a preferential attitude of the decision maker in multi-attribute sorting problems’. More recently, Dimitras et al. [26, p. 278] reported that (on the basis of financial ratios) a rough set approach to predicting between failed and non-failed Greek firms ‘was generally better than those obtained by classical discriminant and logit models’. A limitation of these studies is that the continuous data used to derive the rough set rules, have been discretised (a requirement of RST) with the aid of a selected ‘expert’. Clearly different experts may proffer different views and the operational costs and complexities of using RST (and related techniques) will increase when there is over-reliance on an expert. In this context An et al. [27, p. 647] have stated that ‘It has to be emphasised ... that the question of how to optimally discretise the attribute (variable) values, is unsolved, and a subject of on-going research’. This paper therefore employs a new (and more objective) discretisation method, namely the FUSINTER technique. However, the motivation for data discretisation extends beyond the requirements of RST, to include discretising data of an imprecise quality (‘noisy’ data). The ability to formulate rules from interval data (via discretisation) may also facilitate a more informed understanding of the interaction of the characteristics of objects. In this context, it is of interest to note that, even with regard to traditional statistical estimators (logit/discriminant analysis), it has recently been advocated that continuous variables (financial ratios) should be rank-transformed to improve their distributional properties in a failure prediction setting [28]. A further RST innovation has been the development by Ziarko [29] of a variable precision rough sets (VPRS) model, which incorporates probabilistic decision rules. This is an important extension, since as noted by Kattan and Cooper [30, p. 468], when discussing computer based decision techniques in a corporate failure setting, ‘In real world decision making, the patterns of classes often overlap, suggesting that predictor information may be incomplete... This lack of information results in probabilistic decision making, where perfect prediction accuracy is not expected’. An et al. [27] applied VPRS (which they termed ‘Enhanced RST’) to generating probabilistic rules to predict the demand for water. Relative to the traditional rough set approach, VPRS has the additional desirable property of allowing for partial classification compared to the complete classification required by RST. More specifically, when an object is classified using RST it is assumed that there is complete certainty that it is a correct classification. In contrast, VPRS facilitates a degree of confidence in classification, invoking a more informed analysis of the data, which is achieved through the use of a majority inclusion relation [29]. This paper extends previous work by providing an empirical exposition of VPRS, where we present the results of an experiment which applies VPRS rules to the corporate failure decision. In addition, we mitigate the impact of using the subjective views of an expert (as employed in previous studies) to discretise the data, by utilising the sophisticated FUSINTER discretisation technique which is applied to a selection of attributes (variables) relating to companies’ financial and non-financial characteristics. The discretised data, in conjunction with other nominal attributes, are then used in this new VPRS framework to identify rules to classify companies in a failure setting. To facilitate a comparison of our experimental VPRS results with those of existing techniques, we present the predictive ability of classical statistical methods—logit analysis and MDA—together with two more closely related non-parametric decision-tree methods, RPA and the Elysee method, which utilises ordinal discriminant analysis (see [15] and [31], for an exposition of these methods). However in the spirit of previous experimental research—and more particularly the previous failure prediction study of Frydman et al. [15, p. 239], who concluded that ‘we feel that the attributes of new techniques like RPA can be presented and evaluated in a rigorous framework without the necessity of proving its absolute superiority over existing procedures’—the comparative classification results are not meant to be definitive, but rather to illustrate the potential of VPRS. In this context, research on the criteria to select the most efficacious and parsimonious set of VPRS rules (for predictive purposes) is still in its infancy [27]. The remainder of the paper is organised as follows: The next section gives a brief exposition of the VPRS method and a discussion of the FUSINTER discretisation method. The results of the empirical experiments are then reported, including a discussion of the predictive ability of VPRS relative to other existing parametric and non-parametric methods.

نتیجه گیری انگلیسی

This paper has provided an exposition of variable precision rough sets (VPRS) model which has been developed from rough set theory (RST) as originally formulated by Pawlak [19]. Based on our experimental analysis, the paper has added to the literature in terms of demonstrating the application of VPRS to an important business decision problem, corporate failure prediction. The results of the empirical analysis were encouraging. The use of the FUSINTER method for data discretisation (together with a least square method of nearest rule calculation for classification), mitigated the requirement of the input of a human expert—which may be deemed undesirable by potential users of VPRS, since the use of human expertise may be impractical, relatively costly and may introduce an unacceptable level of subjective bias into the analysis. Compared to classical statistical methods, and more closely related non-parametric decision tree techniques, the VPRS rules when applied to FUSINTER discretised variables, were found to predict with a reasonable degree of accuracy in training and holdout samples. In addition, it was demonstrated that the logit models, which incorporated the discretised variables, outperformed those based on continuous variables. This may well stem from the fact that the FUSINTER method has the added desirable property of eliminating the influence of outliers.16 As is common with experimental research, our results perhaps raise as many questions as they answer. What is clear, however, is that VPRS, building on RST, offers an innovative approach to rule induction, knowledge discovery and management classification problems. Moreover, in a failure prediction setting, VPRS has a number of desirable properties, in terms of information quality and the formulation of explicit rules (which can be assessed by the user for logical interpretation and consistency), and which can be utilised in expert systems—with NASA, for example, currently using applications in this field. As with decision tree techniques, ceteris paribus, a clear benefit to users of VPRS is the ability to interpret individual rules in a decision-making context (as opposed to interpreting coefficients in conventional statistical models). Hence VPRS generated rules are relatively simple, comprehensible and are directly interpretable with reference to the decision domain. For example, users are not required to possess the technical knowledge and expertise associated with interpreting classical models. These VPRS characteristics are particularly useful to decision makers,17 who are interested in interpreting the rules (based on factual cases) with direct reference to the outcomes they are familiar with—for example a bank's decision whether or not to grant credit, or in respect of auditors’ materiality judgements (see, e.g. [61]). Although classical statistical methods often rely (particularly in corporate failure studies) on relatively ‘crude’ methods of variable selection (e.g. stepwise procedures), and the selection of optimal stopping criteria for existing non-parametric decision tree techniques are still evolving, further research in respect of optimal VPRS β-reduct and rule selection criteria is clearly warranted. In addition, the VPRS results presented in this paper are experimental, and do not aim to be definitive in terms of demonstrating the superiority of the new technique over existing methods. Further research is therefore required on these issues, together with an investigation of the asymptotic and sampling properties of VPRS models. Furthermore, the potential loss of information to the user when continuous data are discretised to facilitate RST and VPRS analysis, is an additional issue which is worthy of further attention. It is hoped that the research presented in this paper will stimulate additional work on these important topics.