A special class of semiparametric regression models which are flexible, overcome (or reduce) the “curse of dimensionality” and allow easy interpretation of the effect of each explanatory variable on the response variable was proposed by Engle et al. (1986). This class of models, known as partial linear regression (PLR) models, assumes that the regression function is the sum of a linear and a nonparametric component, that is,
equation(1)
View the MathML sourceYi=XiTβ+m(Ti)+εi(i=1,…,n),
Turn MathJax on
where Xi=(Xi1,…,Xid0)TXi=(Xi1,…,Xid0)T and Ti=(Ti1,…,Tid1)TTi=(Ti1,…,Tid1)T (d0⩾1d0⩾1 and d1⩾1d1⩾1) are vectors of explanatory variables, β=(β1,…,βd0)Tβ=(β1,…,βd0)T is a vector of unknown real parameters, m is an unknown smooth real function and {εi}{εi} are the random errors satisfying
equation(2)
View the MathML sourceE(εi|Xi,Ti)=0(i=1,…,n).
Turn MathJax on
The PLR model has been studied extensively for i.i.d. data (see, for example, Heckman, 1986; Rice, 1986, for spline type estimators; Speckman, 1988; Robinson, 1988, for kernel type estimators; Linton, 1995, for local polynomial type estimators; Bianco and Boente, 2004, for robust estimation) as well as for dependent data (see, for example, Engle et al., 1986, for spline type estimators; Gao, 1995; Aneiros-Pérez et al., 2004, for kernel type estimators). In general, these works propose different estimators for ββ and/or m in (1) and study their consistency and asymptotic normality. As occurs in nonparametric or semiparametric estimation, these estimators depend on smoothing parameters or bandwidths which must be selected from the observations. Therefore, automatic procedures to choose these bandwidths are needed. In spite of the extensive literature on PLR models, few works exist on the crucial problem of bandwidth selection. Some papers dealing with this topic are those of Linton (1995) and Aneiros-Pérez (2002) (both papers propose plug-in type selectors, the first paper focusing on i.i.d. data and the second on dependent data) and that of Aneiros-Pérez and Quintela-del-Río (2001a) (based on cross-validation ideas and focusing on dependent data). See also the work of Liang (2006) for some ideas on the strategy of bandwidth selection. In addition, PLR models have demonstrated their usefulness in many fields of applied sciences, such as economics, environmental studies, medicine, …… (see Härdle et al., 2000, for a monograph and applications of the PLR model).
This paper deals with the estimation of the nonparametric component m of the PLR model (1), assuming both random design on {(Xi,Ti)}{(Xi,Ti)} and mixing conditions on {(Yi,Xi,Ti)}{(Yi,Xi,Ti)}. For pure nonparametric regression models with random design (that is, model (1) where ββ is known and TiTi are random vectors), Masry (1996a) and Xiao et al. (2003), among others, reported their findings. Masry obtained the asymptotic normality of the (conventional) local polynomial estimator, while Xiao et al. obtained that of a local polynomial estimator applied to a prewhitening transformation of the dependent variable, this transformation being estimated from the data. In conclusion, the estimator proposed by Xiao et al. takes into account the correlation structure of the error process and is asymptotically more efficient than that studied by Masry. Given this result, we will estimate m in the model (1) following the procedure proposed by Xiao et al., but applied to PLR models instead of to pure nonparametric models. As we will show, the asymptotic result proven by Xiao et al. holds for our estimator.
The construction of the estimator is presented in Section 2, while the asymptotic results and the conditions used to obtain these are given in Section 3. In Section 4, the finite behavior of the estimator is illustrated using a simulation study, and an application to a real data set is shown in Section 5 (all computations were performed with R). Concluding remarks appear in Section 6. Finally, Section 7 presents the proofs of our theorems while the Appendix gives some lemmas used in those proofs.
In this paper, we have constructed and studied a local polynomial type estimator of the nonparametric part in a PLR model. The design considered was random, and mixing conditions were assumed on the response and explanatory variables. The estimator works on a prewhitening transformation of the model, this transformation being based on the dependence structure of the random errors. The asymptotic normality of the estimator was obtained, and in conclusion (i) the existence of a linear component does not change the asymptotic distribution of the nonparametric estimator, and (ii) when the errors of the model are autocorrelated, the estimator proposed is asymptotically more efficient than the conventional estimator (which works on the original dependent variable of the model). Both estimators (proposed and conventional) were compared using a simulation study, and the better performance of the new estimator was apparent from the curve estimation perspective as well as from the point estimation perspective. Finally, both the usefulness of the PLR model and the competitiveness of the prewhitening transformation were illustrated by application to a financial time series.