تجزیه و تحلیل حساسیت برای متغیرهای پیش بینی کننده در رگرسیون MSAE
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|25613||2002||19 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, Volume 40, Issue 2, 28 August 2002, Pages 355–373
The minimum sum of absolute errors MSAE regression is more resistant to outliers, than the least squares regression, in the values of the response variable and long-tailed error distributions. Because all observations are used to compute the least squares estimates of the parameters of the model, a small change in the value of the response or the predictor variable will effect the estimates. The MSAE estimates of the unknown parameters are completely determined by a subset of observations; therefore, small changes in some of the values of the response or a predictor variable may not affect the MSAE estimates. The basic idea behind influence analysis is that a regression solution should be stable, that is, small changes in the data should not produce large changes in the results. This is Tukey's property of resistance. At times deleting an observation is one way of introducing small changes in the data. However, this may not be a good way to judge the influence of an observation for the MSAE regression. In this paper, we develop a procedure to find an interval on each value of the predictor variable (leaving all other values undisturbed) that leaves the fitted MSAE regression model unchanged. These intervals show the extent of admissible changes in the values of predictor variables to which the fitted MSAE is resistant. This property may be called complete (or absolute) resistance.
The least squares regression continues to dominate the statistical literature. Its success is partially due to the fact that the theory is simple, well developed and documented, and computer programs to implement it are easily available. It is well known that the least squares result is very sensitive to outliers in the values of the response variable and the predictor variables. However, to overcome some of its drawbacks, a number of diagnostic techniques and robust regression procedures have been proposed, Beckman and Cook (1983). The diagnostic techniques identify outliers and influential observations whereas the robust regression procedures accommodate the possibility of outliers. One of the simple robust alternatives to the least squares regression is the minimum sum of absolute errors MSAE regression that is less sensitive to outliers in the values of the response variable. The MSAE estimators of the parameters of the model are the maximum likelihood estimators whenever the errors are mutually independent and follow a Laplace distribution. It is useful to observe that the MSAE regression is to the least squares regression what the sample median is to the sample mean. For example, both the sample mean and the least squares estimators are determined and influenced by all the observations whereas the sample median and the MSAE estimators are determined by only a subset of observations. Just as the value of the sample median is unaffected if the magnitude of an observation changes such that it remains on the same side of (either above or below) the sample median, a similar result holds true for the MSAE regression, Narula and Wellington (1985). That is, the fitted MSAE regression model remains unchanged if the values of the response variable associated with the non-zero residuals change such that these observations remain on the same side of the fitted model. This is very unlike the least squares regression where any change in the value of the response variable of any observation changes the values of the least squares estimates of the parameters. For the simple linear regression model, it is well known (Sposito et al., 1978) that the minimum sum of absolute errors regression line can be chosen so that it passes through at least two observations. We shall call these observations with zero residuals the defining (or basic) observations; and the others the non-defining (or non-basic) observations. Recently, Narula et al. (1993) have shown that if the value of a response (or predictor) variable for a non-defining observation lies within a certain interval, then the fitted MSAE regression line remains unchanged. In this paper, our objective is to extend the work of Narula et al. (1993) to multiple linear regression, i.e., to present a procedure to determine the intervals for the values of the predictor variables and the response variable for the non-defining observations within which the fitted MSAE multiple regression model remains unchanged. Throughout the analysis, for an observation of interest, we may change the value of only one predictor variable or the response variable. No other values in the data set may be changed. Because the MSAE multiple linear regression can be formulated and solved as a linear programming problem, we use some results from linear programming to construct these intervals. The rest of the paper is organized as follows: in Section 2, we motivate the discussion with an example that led us to this research. In Section 3, we present some preliminary results. In Section 4, we propose a procedure to compute these intervals and illustrate it with an example in Section 5. We conclude the paper with a few remarks in Section 6.
نتیجه گیری انگلیسی
The fitted minimum sum of absolute errors regression model remains unchanged if the value of a predictor variable for a non-defining observation lies within certain interval and the values of the other observations are not changed. This analysis is quite useful because it brings to the attention of the analyst those values of the variables that have wide intervals. In such cases, one can be confident that if there was no more than a modest error in the measurement of this value, then the model was not affected. It also highlights those values of the variables that have very narrow interval or the original value of the variable that lies very close to the lower or the upper bound of the interval. However, even if these values were outside the interval, the change that would occur might be small. It may be useful to ensure that these values were taken, recorded and transmitted without mistakes. In short, this analysis allows the user to examine a data set and the fitted MSAE model more critically and carefully and thus have more confidence in the final results and conclusions. The calculations to determine these intervals are quite simple and straightforward. These computations can be performed quite easily using a spreadsheet program like Excel. However, to compute these intervals one needs to fit the MSAE regression model to the data set. Computer programs to fit the MSAE regression model to a data set are included in statistical computer packages like SAS, S-PLUS, and IMSL. The interested reader may also refer to Narula and Wellington (1982), Narula (1987), Narula et al. (1999), and Portnoy and Koenker (1997). It may be noted that constructing intervals on the values of the predictor variables corresponds to sensitivity analysis on the technical coefficients in a linear programming problem. For a general linear programming problem, this cannot be accomplished easily. However, because the MSAE regression problem has a very special structure, it was possible to develop closed form formulae to compute the intervals on the values of predictor and the response variables such that the fitted MSAE model remains unchanged.