دانلود مقاله ISI انگلیسی شماره 24257
ترجمه فارسی عنوان مقاله

انتخاب پهنای باند مقاوم در مدل های رگرسیون خطی تقریبا شبه پارامتریک: مطالعه مونت کارلو و تجزیه و تحلیل تاثیر گذار

عنوان انگلیسی
Robust bandwidth selection in semiparametric partly linear regression models: Monte Carlo study and influential analysis
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
24257 2008 21 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Computational Statistics & Data Analysis, Volume 52, Issue 5, 20 January 2008, Pages 2808–2828

ترجمه کلمات کلیدی
خواص مجانبی - انتخاب پهنای باند - وزن هسته - مدل قسمتی خطی - برآورد مقاوم - تکنیک و نرم کننده -
کلمات کلیدی انگلیسی
Asymptotic properties , Bandwidth selectors, Kernel weights, Partly linear models, Robust estimation, Smoothing techniques,
پیش نمایش مقاله
پیش نمایش مقاله  انتخاب پهنای باند مقاوم در مدل های رگرسیون خطی تقریبا شبه پارامتریک: مطالعه مونت کارلو و تجزیه و تحلیل تاثیر گذار

چکیده انگلیسی

In this paper, under a semiparametric partly linear regression model with fixed design, we introduce a family of robust procedures to select the bandwidth parameter. The robust plug-in proposal is based on nonparametric robust estimates of the ννth derivatives and under mild conditions, it converges to the optimal bandwidth. A robust cross-validation bandwidth is also considered and the performance of the different proposals is compared through a Monte Carlo study. We define an empirical influence measure for data-driven bandwidth selectors and, through it, we study the sensitivity of the data-driven bandwidth selectors. It appears that the robust selector compares favorably to its classical competitor, despite the need to select a pilot bandwidth when considering plug-in bandwidths. Moreover, the plug-in procedure seems to be less sensitive than the cross-validation in particular, when introducing several outliers. When combined with the three-step procedure proposed by Bianco and Boente [2004. Robust estimators in semiparametric partly linear regression models. J. Statist. Plann. Inference 122, 229–252] the robust selectors lead to robust data-driven estimates of both the regression function and the regression parameter.

مقدمه انگلیسی

Partly linear models have become an important tool when modelling biometric data, since they combine the flexibility of nonparametric models and the simple interpretation of the linear ones. These models assume that we have a response yi∈Ryi∈R and covariates or design points View the MathML source(xiT,ti)T∈Rp+1 satisfying equation(1) View the MathML sourceyi=xiTβ+g(ti)+εi,1⩽i⩽n, Turn MathJax on with the errors εiεi independent and independent of View the MathML source(xiT,ti)T. The semiparametric nature of model (1) offers more flexibility than the standard linear model, when modelling a complicated relationship between the response variable with one of the covariates. At the same time, they keep a simple functional form with the other covariates avoiding the “curse of dimensionality” existing in nonparametric regression. In many situations, it seems reasonable to suppose that a relationship between the covariates xx and t exists, so as in Speckman (1988), Linton (1995) and Aneiros-Pérez and Quintela del Río (2002), we will assume that for 1⩽j⩽p1⩽j⩽p equation(2) View the MathML sourcexij=φj(ti)+ηij,1⩽i⩽n, Turn MathJax on where the errors ηijηij are independent. Moreover, the design points titi will be assumed to be fixed. Several authors have considered the semiparametric model (1). See, for instance, Denby (1986), Rice (1986), Robinson (1988), Speckman (1988) and Härdle et al. (2000) among others. All these estimators, as most nonparametric estimators, depend on a smoothing parameter that should be chosen by the practitioner. As it is well known, large bandwidths produce estimators with small variance but high bias, while small values produce more wiggly curves. This trade-off between bias and variance lead to several proposals to select the smoothing parameter, such as cross-validation procedures and plug-in methods. Linton (1995), using local polynomial regression estimators, obtained an asymptotic expression for the optimal bandwidth in the sense that it minimizes a second order approximation of the mean square error of the least squares estimate, View the MathML sourceβ^LS(h), of ββ. This expression depends on the regression function we are estimating and on parameters which are unknown, such as the standard deviation of the errors. More precisely, for any c∈Rpc∈Rp, let View the MathML sourceσ2=σε2cTΣη-1c be the asymptotic variance of View the MathML sourceU=cTn1/2(β^LS(h)-β), and nMSE(h)=EU2/σ2nMSE(h)=EU2/σ2 its standardized mean square error. For the sake of simplicity, assume that the smoothing procedure corresponds to local means and that the design points are almost uniform design points, i.e., View the MathML source{ti}i=1n are fixed design points in [0,1][0,1], 0⩽t1⩽⋯⩽tn⩽10⩽t1⩽⋯⩽tn⩽1, such that t0=0t0=0, tn+1=1tn+1=1 and max1⩽i⩽n+1|(ti-ti-1)-1/n|=O(n-δ)max1⩽i⩽n+1|(ti-ti-1)-1/n|=O(n-δ) for some δ>1δ>1. Then, under general conditions, we have that, for ν⩾2ν⩾2, MSE(h)=n-1{1+(nh)-1A2+o(n-2μ)+(n1/2h2νA1+o(n-μ))2},MSE(h)=n-1{1+(nh)-1A2+o(n-2μ)+(n1/2h2νA1+o(n-μ))2}, Turn MathJax on where μ=(4ν-1)/(2(4ν+1))μ=(4ν-1)/(2(4ν+1)), View the MathML sourceφ(ν)(t)=(φ1(ν)(t),…,φp(ν)(t))T, View the MathML sourceαν(K)=∫uνK(u)du, K*(u)=K*K(u)-2K(u)K*(u)=K*K(u)-2K(u) and View the MathML sourceA1=αν2(K)(ν!)-2σ-1cTΣη-1∫01g(ν)(t)φ(ν)(t)dt,A2=∫K*2(u)du. Turn MathJax on Therefore, the optimal bandwidth in the sense of minimizing the asymptotic MSE(h)MSE(h), is given by hopt=A0n-πhopt=A0n-π, with π=2/(4ν+1)π=2/(4ν+1) and View the MathML sourceA0=(A2/(4νA12))π/2, i.e., equation(3) View the MathML sourceA0=∫K*2(u)du4νσ-1cTΣη-1αν2(K)(ν!)-2∫01g(ν)(t)φ(ν)(t)dt2π/2. Turn MathJax on Linton (1995) considered a plug-in approach to estimate the optimal bandwidth and showed that it converges to the optimal one, while Aneiros-Pérez and Quintela del Río (2002) studied the case of dependent errors. It is well known that, both in linear regression and in nonparametric regression, least squares estimators can be seriously affected by anomalous data. The same statement holds for partly linear models, where large values of the response variable yiyi can cause a peak on the estimates of the smooth function g in the neighborhood of titi. Moreover, large values of the response variable yiyi combined with high leverage points xixi produce also, as in linear regression, breakdown of the classical estimates of the regression parameter ββ. To overcome that problem, Bianco and Boente (2004) considered a three-step robust estimate for the regression parameter and the regression function. Besides, for the nonparametric regression setting, i.e., when β=0β=0, the sensitivity of the classical bandwidth selectors to anomalous data was discussed by several authors, such as, Leung et al. (1993), Wang and Scott (1994), Boente et al. (1997), Cantoni and Ronchetti (2001) and Leung (2005). In this paper, we consider a robust plug-in selector for the bandwidth, under the partly linear model (1) which converges to the optimal one and leads to robust data-driven estimates of the regression function g and the regression parameter ββ. We derive an expression analoguous to (3) for the optimal bandwidth of the three-step estimator introduced in Bianco and Boente (2004). As for its linear relative, this expression will depend on the derivatives of the functions g and φφ. In Section 2, we review some of the proposals given to estimate robustly the derivatives of the regression function under a nonparametric regression model. The robust plug-in bandwidth selector for the partial linear model is introduced in Section 3 together with a robust cross-validation procedure. In Section 4, for small samples, the behavior of the classical and resistant selectors is compared through a Monte Carlo study under normality and contamination. Finally, in Section 5 an empirical influence measure for the bandwidth selector is introduced. We use this measure to study the sensitivity of the proposed plug-in and cross-validation selectors on some generated examples.

نتیجه گیری انگلیسی

Selection of the smoothing parameter is an important step in any nonparametric analysis, even when robust estimates are used. The classical procedures based on least squares cross-validation or on a plug-in rule turn out to be non-robust since they lead to over or undersmoothing as noted for nonparametric regression by Leung et al. (1993), Wang and Scott (1994), Boente et al. (1997), Cantoni and Ronchetti (2001) and Leung (2005). The same conclusions hold under a partly linear regression model. Our proposals tends to overcome the sensitivity of the classical selectors by considering robust estimators of the derivatives of the regression function or a robust cross-validation criteria, under a partly linear regression model. The problem of defining the influence function of the smoothing parameter is still an outstanding issue. We introduced an empirical influence measure that allows to evaluate on a given data set the sensitivity of the bandwidth selector to anomalous data. It turns out that, under a partly linear model, the classical plug-in bandwidth defined in Linton (1995) is not robust, since it leads to unbounded empirical influence functions. On the other hand, our proposals have bounded empirical influence even when introducing several outliers. The best performance, in all cases, for the considered model and the studied contaminations is attained by the plug-in rules, even if they are all influenced by multiple outliers. In particular, the differentiating approach lead to smaller influence functions than that based on polynomials when dealing with more than one outlier.