پیش پردازش داده مبتنی بر تحلیل پوششی داده اطلاعات برای مدل های ارزیابی حالت خطی تصمیم گیری حداکثری
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|4586||2012||8 صفحه PDF||سفارش دهید||7140 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 39, Issue 10, August 2012, Pages 9435–9442
In this paper, we use data envelopment analysis (DEA) to preprocess training data cases before the maximum decisional efficiency (MDE) principle is used to estimate discriminant function parameters. Using an example from the literature and simulated datasets, we compare the performance of DEA-MDE procedure for parameter estimation with traditional MDE procedure without data preprocessing. The results of our experiments indicate that the DEA-MDE procedure eliminates some inconsistencies caused by MDE principle, provides results that are consistent with an ensemble of expert decisions, reduces dimensionality of examples used in training datasets, and performs equal to or better than the MDE procedure for holdout sample tests. The DEA-MDE procedure appears to be sensitive to class data distribution and best results are obtained when a class data distribution is exponential.
Data envelopment analysis (DEA) is a technique developed to measure efficiency of decision-making units (DMUs) in a variety of settings. Since its introduction, the technique has been used for several manufacturing, banking, health care and service industries. Recently, DEA has been used for data-mining applications. Among the applications of DEA in data mining are the uses of DEA for data preprocessing in forecasting applications (Pendharkar, 2005 and Pendharkar and Rodger, 2003), outlier detection (Banker & Chang, 2006), classification (Pendharkar, 2011; Seiford & Zhu, 1998; Troutt, Rai, & Zhang, 1996), cluster analysis (Po, Guh, & Yang, 2009) and inverse classification problems (Pendharkar, 2002). We do not know of any studies that have used DEA for data preprocessing for classification applications. Pendharkar (2005) used the DEA based data preprocessing for forecasting applications where predicted variable was continuous. In classification applications, predicted variable is binary and the application of DEA for data preprocessing requires a different approach. An application of DEA for data preprocessing in classification applications would be desirable for at least two reasons. First, data preprocessing would result in fewer records and less computational effort. Second, data preprocessing would result in elimination of trivial classification examples and outliers resulting in a classification function that may have better generalizability due to lower training data over fitting2. There is substantial literature on data preprocessing (Kone & Karwan, 2011) for classification problems (Chen et al., 2010 and Wang and Shi, 2008). Some of the reasons for data preprocessing are to improve scalability (Wang & Shi, 2008), reduce bias originating from class imbalance (Chen, Hsu, & Chang, 2010), and improve generalizability (Pendharkar, 2005). Most DEA data mining applications consider single output multiple input settings (Pendharkar, 2011; Seiford & Zhu, 1998; Troutt et al., 1996). Banker (1993) provides a statistical foundation for DEA under the single output multiple input setting where DEA estimators are shown to be maximum likelihood estimators (MLE) of non-parametric probability density functions. Banker (1993) argues that the primary difference between statistical MLE and DEA estimators is the assumption that the production frontier in DEA is non-parametric monotone increasing and concave function. When all the assumptions of single output, multiple inputs and concave monotone increasing production function are satisfied, Banker (1993) showed that DEA estimators maximize the likelihood for a broad class of density functions including exponential and half-normal distributions. When DEA is used for classification, DMUs with efficiency (ξ) score of 1 are considered to lie on the classification boundary or envelopment that separates one class from another (Pendharkar, 2011). If we define one-sided deviation term ν = (1 − ξ) then maximizing likelihood of the probability density function for ν, α(ν), is equivalent to minimizing the sum of deviations if α(ν) is exponential or equivalent to minimizing the sum of squared deviations if α(ν) is half-normal (Banker, 1993). Thus, computing efficiencies of DMUs and removing the DMUs with low efficiency scores is equivalent to removing DMUs that lie in the tail of α(ν). As low efficiency DMUs are removed from the original dataset, the MLE estimate for the remaining dataset with fewer DMUs will be lower than the original dataset. The lowering of MLE estimate may achieves better generalization due to removal of outliers/trivial classification cases, however, care must be exercised to not eliminate too many DMUs where the new model may lose generalizability compared to the model built from the original dataset. Troutt (1995) proposed a related decisional efficiency based procedure for parameter estimation of certain optimization models. Troutt (1995) showed that these parameter estimation models can be formulated using the maximum decisional efficiency (MDE) principle. The MDE model was shown to be a MLE for certain class of monotone increasing density functions. The primary difference between MDE and DEA is that in former case a production function is specified in finite number of parameters, whereas in case of DEA the number of parameters to be estimated increases with the sample size (Banker, 1993). For finite sample sizes, DEA estimators would be biased and provide MLE estimates below the theoretical frontier  suggested by MLE model. Given that Troutt (1995) proved that the MDE model maximizes likelihood of certain monotone increasing density functions, it can be assumed that DEA estimators would provide MLE estimates below the theoretical frontier suggested by the MDE model. To illustrate the utility of DEA for data preprocessing for MDE linear case valuation models, we consider two different scenarios. In the first scenario, we assume a classification problem where classification data is generated by several decision-makers, which in the context of DEA may be considered as different decision-making processes. Given different decision-making processes, DEA is applied independently to screen examples that are fed into MDE data aggregation and parameter estimation process (Troutt, 1995, Troutt et al., 1997 and Troutt et al., 1997). The MDE principle aggregates data from different decision-making processes or decision-makers and generates a linear case valuation model that can be used for classification (Troutt et al., 1997 and Troutt et al., 1997). When the DEA is used to preprocess data for the MDE model, we call our procedure DEA-MDE, otherwise the procedure is called MDE. In the second scenario, we consider classic classification problem where training data comes from one source and represents only one decision-making process. In both scenarios, we compare DEA-MDE results with the MDE results. The rest of the paper is organized as follows. In Section 2, we provide an overview of the DEA based preprocessing and the MDE principle based data aggregation for linear case valuation. In Section 3, we illustrate the application of DEA-MDE procedure using a pre-reported example of multiple decision-maker data; and compare it with the MDE procedure without data preprocessing and a multiple decision-maker ensemble of what we call overlapping cases. In Section 4, using simulated datasets, we compare the application of DEA-MDE and the MDE procedures for our second scenario. In Section 5, we conclude the paper with a summary and provide a few directions for future work.
نتیجه گیری انگلیسی
In this paper, we have shown how DEA can be used to screen cases and remove inconsistencies that may arise in the MDE linear case valuations models. Using an example from the literature and simulated datasets, we illustrated an application of the DEA-MDE procedure. The results of our study indicate that the MDE procedure is greedy in maximizing likelihood of decisional efficiency score distribution of unobserved sample values and performs best when group data distribution is exponential. The input-oriented DEA procedure appears to maximize the entropy of decision-making attributes. The advantages of maximum entropy over maximum likelihood and its connection to minimum description length principle have been discussed in the literature (Feder, 1986). Since maximum entropy is a special case of minimum description length principle (Feder, 1986), when both procedures are combined, the resulting DEA-MDE procedure appears to provide a solution to the dimensionality reduction problem (Fu, 1999) providing best solution in terms of number of decision-making attributes, number of examples and likelihood of the data distribution of unobserved sample values. The proposed DEA based screening procedure is general and may be applied to any classification procedure as long as two conditions are satisfied. These two conditions are satisfaction of conditional monotonicity assumption and non-negative decision-making attribute vectors. The conditional monotonicity assumption must be strictly satisfied for best results. In case of classification problems with negative attributes, a constant positive constant may be added to each of the decision-making attributes so that resulting set of attributes are all non-negative and satisfies the second assumption. Generally, best results are obtained when group data distribution is exponential. When two conditions are satisfied, comparison between the DEA-MDE procedure with traditional MDE procedure indicates that the DEA-MDE will always perform better than or equal to the MDE procedure. Our research can be extended in several ways. First, we only used accept class cases for our MDE procedure. It is possible to use reject class cases and create a minimum decisional inefficiency (MDI) estimation problem. Such MDI problem will lead to a min-max linear programming problem as opposed to the max-min linear programming MDE formulation (9), (10) and (11). It would be worth comparing the performance of MDI vs. MDE formulations. Second, in certain cases where error costs of classification errors are asymmetric, it may be valuable to use two scoring functions (MDI and MDE) and multiply each score with associated error costs to evaluate organizational value of a case and pick the case with lowest error cost. Third, it may be possible to use fuzzy logic to develop stopping criterion for procedure shown in Fig. 4. More specifically, if a fuzzy variable is used that measures the magnitude of difference |ψOLD−ψNEW| as low and high then the DEA-MDE procedure may continue data preprocessing as long as the fuzzy variable has a “low” value and stop when it has a “high” value. Finally, it may be possible to use new statistical or interactive procedures to determine the most appropriate cutoff value for δ. Benefits of different approaches to determine cutoff values may be an interesting future research undertaking.