دانلود مقاله ISI انگلیسی شماره 20009
ترجمه فارسی عنوان مقاله

دقت پیش بینی شبکه های عصبی مصنوعی و رگرسیون چندمتغیره در مورد داده های تحریف شده: اکتشاف برخی از مسائل

عنوان انگلیسی
The predictive accuracy of artificial neural networks and multiple regression in the case of skewed data: exploration of some issues
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
20009 2000 7 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Expert Systems with Applications, Volume 19, Issue 2, August 2000, Pages 117–123

ترجمه کلمات کلیدی
شبکه های عصبی - رگرسیون - داده های تحریف شده
کلمات کلیدی انگلیسی
Neural networks, Regression, Skewed data,
پیش نمایش مقاله
پیش نمایش مقاله   دقت پیش بینی شبکه های عصبی مصنوعی و رگرسیون چندمتغیره در مورد داده های تحریف شده: اکتشاف برخی از مسائل

چکیده انگلیسی

Business organizations can be viewed as information-processing units making decisions under varying conditions of uncertainty, complexity, and fuzziness in the causal links between performance and various organizational and environmental factors. The development and use of appropriate decision-making tools has, therefore, been an important activity of management researchers and practitioners. Artificial neural networks (ANNs) are turning out to be an important addition to an organization's decision-making tool kit. A host of studies has compared the efficacy of ANNs to that of multivariate statistical methods. Our paper contributes to this stream of research by comparing the relative performance of ANN and multiple regression when the data contain skewed variables. We report results for two separate data sets; one related to individual performance and the second to firm performance. The results are used to highlight some salient issues related to the use of ANN and multiple regression models in organizational decision-making.

مقدمه انگلیسی

Increasing attention is being paid to the use of artificial neural networks (ANN) in managerial decision-making. As many managerial decision situations are fraught with variety, ambiguity and complexity (Mintzberg, Raishinghani & Theoret, 1976), ANNs are appealing as managerial decision-making aids precisely because of their expected effectiveness in such situations (Lippman, 1987). An important element in the life-cycle of an innovation (and ANN is an innovation in the field of organizational decision-making aids) is the establishment of the contingencies under and contexts in which the innovation is most effective. Hence, a stream of studies on ANNs has focused on delineating the boundaries of their usefulness (e.g. Duliba, 1991, Dutta and Shekar, 1988, Gorr et al., 1994, Marquez and Hill, 1993, Marquez et al., 1991 and Sharda and Wilson, 1993). The studies reported in this paper add to this line of research. Using ‘real-world’ data we conduct an empirical investigation into the relative efficacy of ANNs and multiple regression when the sample data are not normally distributed, i.e. they are skewed. The effect of this contingency on the behavior of ANN and multiple regression models needs to be investigated given that reliability of multivariate statistical methods requires that data be multivariate normal. From an organizational effectiveness view, there is a need for studies that establish the pros and cons of any innovation. Corporations tend to have a pro-innovation bias-leading them to adopt ‘promising’ but untested innovations (Kimberly, 1985). For example, use of labels such as “Expert” and “Intelligent” has been shown to lead to complacency as well as unthinking dependence on such systems among users (Will, 1991). Further, innovation–adoption can be disruptive because of their organization-wide consequences (Sviola, 1990). Thus, there is clearly a need for systematic investigations of the contexts and contingencies affecting the predictive accuracy of ANN models. A number of studies have focused on investigating the relative performance of statistical and ANN methods in forecasting. These studies can be differentiated from each other on two dimensions. The first is in data types. Some studies use actual data (i.e. data from real-world) and others have used simulated data. The second dimension relates to differences in measures. Studies have used measures that are either nominal/categorical or interval/ratio-scale. Fig. 1 provides some examples of studies in each of the cells. Full-size image (26 K) Fig. 1. Categorization of ANN efficacy studies (Bansal et al., 1993, Fisher and McKusick, 1989, Salchenberger et al., 1979 and Tam and Kiang, 1992). Figure options There appears to be a preponderance of studies in cells one and four. If real-world data are used, usually the phenomenon is represented by a categorical variable (e.g. solvent/insolvent, bankrupt/not bankrupt, loan granted/denied, etc.). It is only when simulated data are used in the analyses that we observe the measures to be continuous/interval scale (cell 4). The two studies reported in this paper fall in cell two. Both the studies reported here use real-world data, and have variables that are measured in ratio-scale. While both studies are of organizational phenomena, they differ in their level of analysis. The first focuses on individual performance and the second on firm level performance. In addition to differences in data type and variable measurement, our study differs from earlier ones by considering the additional dimension of variable skewness. Specifically, we investigate whether skewness in a sample's dependent variable affects the efficacy of ANN and multiple regression models. Analyses by Marquez et al. (1991) using simulated data, indicate that they do. The reliability of any statistically derived result is known to be strongly dependent on the degree to which the sample distribution is multivariate normal. Formulae for tests of statistical significance of regression coefficients are based on this assumption (Tabachnick & Fidell, 1983). To the extent that this assumption is violated, generalization of statistics-based results data beyond the sample will be highly suspect. Hence, data sets with skewed variables are particularly well suited for testing the relative efficacy of ANN and regression models. A large proportion of studies support the use of ANN-based reasoning to deal with unstructured or semi-structured decision situations (e.g. Dutta and Shekar, 1988, Gallant, 1988 and Yoon and Swales, 1991). These (and other) studies comparing ANN performance to that of multivariate statistical methods, have found ANNs to be better at prediction. However, Marquez et al. (1991) found in their simulation study that ANN-based systems perform better than regression techniques only when sample sizes are small and when variables are strongly correlated. Duliba (1991) found that an ANN model did not perform as well as regression when additional explanatory variables were introduced into the modeling. Gorr et al. (1994) noted that, for their data, although multiple regression was best overall there were no statistically significant differences in predictive accuracy across four different models. Further, they observe that Neither the stepwise regression nor the ANN benefited when additional model structures were incorporated (p. 31). As skewness is a factor affecting the various models’ behavior, we compare the performance of both multiple regression and ANN by deliberately choosing samples characterized by highly-skewed variables. The paper is structured as follows: the two studies are reported next. Data set for the first study consists of a sample of MBA students where the focus is on predicting the students’ graduating GPA. We conclude the paper by discussing the comparative performance of both ANN and regression models on the two data sets, suggesting guidelines for the use of ANNs for knowledge acquisition, and proposing future research directions.

نتیجه گیری انگلیسی

The two studies conducted here are typical of a wide class of phenomena studied in organizations. Study One is an instance of an organizational behavior phenomenon at the individual level of analysis. Study Two is a typical example of phenomenon studied at the organization level of analysis. Most of the studies in management fall into one of these two levels. Seen in this perspective, results of the studies reported here ought to be given careful consideration. The results indicate that ANNs are not consistently good at prediction. There is a need for systematic empirical studies that will help determine the types of problem situations where ANNs will yield superior predictions. One way to do this is to create data sets that vary systematically on the three dimensions of: (i) number of variables; (ii) noisiness of each variable; and (iii) sample size, and then investigate the variations in each method's predictive accuracy. The simulation data set of Marquez et al. (1991) was created using only one independent variable, and hence, does not capture the efficacy of ANNs for analyzing phenomenon involving complex interacting factors. An advantage possessed by ANNs is that they remove the guesswork involved in finding the right transformation. Linear statistical models require finding the right transformation for the variables. In contrast, ANNs capacity for learning and self-transformation provides an alternative to the guesswork involved in identifying the distributions and transformations required in a linear model (Marquez et al., 1991, p. 129). Whether we want automated systems to do the learning is context dependent. For example, strategic decision-making (of which Study Two is a good instance) is one area where it may be prudent for learning to occur in managerial minds rather than in an ANN (or Expert System). Strategic issues and problem situations are open-ended, ambiguous and equivocal where all cause–effect relationships pertaining to a situation are not known. In such cases, the usefulness of ANNs and Expert Systems does not lie in providing accurate predictions. Rather it lies in the potential for exploration of ‘what-if’ scenarios by changing the number of factors affecting a decision situation and by varying the strength of relationships among the factors. The fact that neural networks don't provide us details of interrelationships between the nodes hinders understanding and learning by managers.