Many existing methods for functional regression are based on the minimization of an L2L2 norm of the residuals and are therefore sensitive to atypical observations, which may affect the predictive power and/or the smoothness of the resulting estimate. A robust version of a spline-based estimate is presented, which has the form of an MM estimate, where the L2L2 loss is replaced by a bounded loss function. The estimate can be computed by a fast iterative algorithm. The proposed approach is compared, with favorable results, to the one based on L2L2 and to both classical and robust Partial Least Squares through an example with high-dimensional real data and a simulation study.1
We consider the analysis of data described by a linear functional regression model. That is, our data are independent identically distributed (i.i.d.) pairs View the MathML source(Xi,yi),i=1,…,n, where yi∈Ryi∈R and Xi(.)Xi(.) are random functions defined on an interval II, such that
equation(1)
View the MathML sourceyi=α0+∫Iα(t)Xi(t)dt+ei,i=1,…,n,
Turn MathJax on
where the number α0α0 and the function α(t)α(t) are unknown, and {ei}{ei} are i.i.d. random errors independent of {Xi}{Xi}. In practice one actually observes at given points t1<⋯<tpt1<⋯<tp in II the values xij=Xi(tj)xij=Xi(tj). Henceforth we shall denote View the MathML sourceX=[xij]∈Rn×p and View the MathML sourcey=[yi]∈Rn;
These data sets are often high-dimensional, in many cases with p≫np≫n. The functional framework allows to profit from qualitative assumptions like smoothness of underlying curves. This type of regression model was first considered in Ramsay and Dalzell (1991). Ramsay and Silverman, 2002 and Ramsay and Silverman, 2005 and Ferraty and Vieu (2006), present several case studies demonstrating the advantages of these models. Among recent applications, Goldsmith et al. (2010) present an application to diffusion tensor imaging (DTI) tractography, and Delaigle et al. (2009) deal with a meteorological application. Cardot et al., 2005 and Cardot et al., 2006 present the theory and applications of quantile regression for functional data.
One of the most important approaches for the estimation of α0α0 and αα is regularization through a penalized least squares approach after expanding in some basis such as splines: see Ramsay and Dalzell (1991), Eilers and Marx (1996), Marx and Eilers (1999), Cardot et al. (2003). Crambes et al. (2009) proposed a smoothing splines approach prolonging previous work from Cardot et al. (2007). They show that the rates of convergence of their estimators are optimal in the sense that they are minimax over large classes of distributions of XiXi and of functions αα. Their approach boils down to an easy to implement procedure. Recently Wang et al. (2012) proposed a spline-based nonparametric transformation model for functional regression.
Most approaches to functional regression are based on minimizing some L2L2 norm, and are therefore sensitive to outliers, which calls for the development of robust methods. There are numerous articles on robust methods for functional data. In particular, Crambes et al. (2008) propose a robust estimator for nonparametric models, and Gervini (submitted for publication) deals with robust regression between two stochastic processes; But we are not aware of any robust approach for model (1). The purpose of this article is to propose a robust version of the estimator proposed by Crambes et al. (2009), based on the approach of MM estimation (Yohai, 1987).
Section 2 describes the proposed estimator, the advantages of which are demonstrated in Sections 3 and 4 through their performances with real and simulated data sets, respectively. The computing times of the different estimators are compared in Section 5. Finally Section 6 contains the conclusions of the study.
In the vessel data example, the MM approach showed a better predictive performance than L2L2 and both versions of PLS. In the simulations, MM showed a reasonable efficiency (compared to L2L2 and PLS) for normal data, and was in general more robust than R-PLS for contaminated data. In the trade-off between efficiency and robustness, it seems that MM(2) is the estimator of choice.