صاف کردن برآوردگرهای "خط نقطه گذرهای درجه دوم حداقل" در رگرسیون خطی کاربردی با خطا در متغیر
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|24221||2007||17 صفحه PDF||سفارش دهید||8320 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, Volume 51, Issue 10, 15 June 2007, Pages 4832–4848
The total least squares method is generalized in the context of the functional linear model. A smoothing splines estimator of the functional coefficient of the model is first proposed without noise in the covariates and an asymptotic result for this estimator is obtained. Then, this estimator is adapted to the case where the covariates are noisy and an upper bound for the convergence speed is also derived. The estimation procedure is evaluated by means of simulations.
A very common problem in statistics is to explain the effects of a covariate on a response (variable of interest). While the covariate is usually considered as a vector of scalars, nowadays, in many applications (for instance in climatology, remote sensing, linguistics, ……) the data come from the observation of a continuous phenomenon over time or space: see Ramsay and Silverman (2002) or Ferraty and Vieu (2006) for examples. The increasing performances of measurement instruments permit henceforth to collect these data on dense grids and they cannot be considered anymore as variables taking values in RpRp. This necessitated to develop for this kind of data ad hoc techniques which have been popularized under the name of functional data analysis and have been deeply studied these last years (to get a theoretical and practical overview on functional data analysis, we refer to the books from Bosq, 2000, Ramsay and Silverman, 1997, Ramsay and Silverman, 2002 and Ferraty and Vieu, 2006). Our study takes place in this framework of functional data analysis in the context of regression estimation evocated above. Thus, we consider here the case of a functional covariate while the response is scalar. To be more precise, we first consider observations (Xi,Yi)i=1,…,nXi,Yii=1,…,n, where the XiXi's are real functions defined on an interval II of RR with the assumption that it is square integrable over II. As usually assumed in the literature, we then work on the separable real Hilbert space L2(I)L2(I) of such functions ff defined on II such that View the MathML source∫If(t)2dt is finite. This space is endowed with its usual inner product 〈.,.〉〈.,.〉 defined by View the MathML source〈f,g〉=∫If(t)g(t)dt for f,g∈L2(I)f,g∈L2(I), and the associated norm is noted ∥.∥L2∥.∥L2. Now, the model we consider to summarize the link between covariates XiXi and responses YiYi is a linear model introduced in Ramsay and Dalzell (1991) and defined by equation(1) View the MathML sourceYi=∫Iα(t)Xi(t)dt+εi,i=1,…,n, Turn MathJax on where α∈L2(I)α∈L2(I) is an unknown functional parameter and εi,i=1,…,nεi,i=1,…,n are i.i.d. real random variables satisfying E(εi)=0Eεi=0 and View the MathML sourceEεi2=σε2. The functional parameter αα has been estimated in various ways in the literature: see Ramsay and Silverman (1997), Marx and Eilers (1999) and Cardot et al., 1999 and Cardot et al., 2003. Here, our final goal is to deal with the problem of estimating αα in the case where Xi(t)Xi(t) is corrupted by some unobservable error. Before going further, let us note that there can be different ways to generate the curves XiXi. One possibility is a fixed design, that is, X1,…,XnX1,…,Xn are fixed, non-random functions. Examples are experiments in chemical or engineering applications, where XiXi corresponds to functional responses obtained under various, predetermined experimental conditions (see for instance Cuevas et al., 2002). In other applications one may assume a random design, where X1,…,XnX1,…,Xn are an i.i.d. sample. In any case, Y1,…,YnY1,…,Yn are independent and the expectations always refer to the probability distribution induced by the random variables ε1,…,εnε1,…,εn, only. In the case of random design, they thus formally have to be interpreted as conditional expectation given X1,…,XnX1,…,Xn. This implies for instance that E(εi|Xi)=0EεiXi=0 and View the MathML sourceEεi2Xi=σε2. In what precedes it is implicitly assumed that the curves XiXi are observed without error (in model (1) all the errors are confined to the variable YiYi by the way of εiεi). Unfortunately, this assumption does not seem to be very realistic in practice, and many errors (instrument errors, human errors, ……) prevent to know X1,…,XnX1,…,Xn exactly. Furthermore, it is to be noted that in practice, the whole curves are not available, so we suppose in the following that the curves are observed in pp discretization points t1<⋯<tpt1<⋯<tp belonging to II, that we will take equispaced. Taking from now on I=[0,1]I=[0,1] in order to simplify the notations, we thus have t1=1/2pt1=1/2p, tj-tj-1=1/ptj-tj-1=1/p for all j=2,…,pj=2,…,p. Thus, we observe discrete noisy trajectories equation(2) View the MathML sourceWitj=Xitj+δij,i=1,…,n,j=1,…,p, Turn MathJax on where View the MathML sourceδiji=1,…,n,j=1,…,p is a sequence of independent real random variables, such that, for all i=1,…,ni=1,…,n and j=1,…,pj=1,…,p E(δij)=0,Eδij=0, Turn MathJax on and View the MathML sourceEδij2=σδ2. Turn MathJax on The noise components δijδij are not discrete realizations of continuous time “random noise” stochastic process and must be interpreted as random measurement errors at the finite discretization points (see e.g. Cardot, 2000 and Chiou et al., 2003 for similar points of view). The problem of the errors-in-variables linear model has already been studied in many ways in the case where the covariate takes values in RR or RpRp, that is to say when it is univariate or multivariate. For instance, the maximum likelihood method has been applied to this context (see Fuller, 1987), and asymptotic results have been obtained (see for example Gleser, 1981). Because this problem is strongly linked to the problem of solving linear systems Ax≈b,Ax≈b, Turn MathJax on where x∈Rpx∈Rp is unknown, b∈Rnb∈Rn and AA is a matrix of size n×pn×p, some numerical approaches have also been proposed. One of the most famous is the total least squares (TLS) method (see for example Golub and Van Loan, 1980 and Van Huffel and Vandewalle, 1991). Now, coming back to model (1), very few works have been done in the case of errors-in-variables: in a recent work Chiou et al. (2003) a two-step approach is proposed which consists in first smoothing the noisy trajectories in order to get denoised curves and then build functional estimators. The point of view adopted here is quite different and deals with the extension of the TLS approach in the context of the functional linear model. Let us describe our formal framework for errors-in-variables which is inspired from what is done in the literature. We introduce a discretized version of the inner product 〈.,.〉〈.,.〉 denoted by 〈.,.〉p〈.,.〉p and defined for f,g∈L2(I)f,g∈L2(I) by View the MathML source〈f,g〉p=1p∑j=1pftjgtj. Turn MathJax on This approximation of 〈.,.〉〈.,.〉 by 〈.,.〉p〈.,.〉p is valid only if pp is large enough, so we assume this from now on. In this context of discretized curves, relation (1) then writes equation(3) View the MathML sourceYi=1p∑j=1pαtjXitj+εi,i=1,…,n. Turn MathJax on Finally the problem is to estimate αα using data (Wi(tj),Yi)i=1,…,n,j=1,…,pWitj,Yii=1,…,n,j=1,…,p where W1(tj),…,Wn(tj)W1tj,…,Wntj are noisy observations of X1(tj),…,Xn(tj)X1tj,…,Xntj for j=1,…,pj=1,…,p. The generalization of the TLS method to the case where XiXi is a functional random variable is presented in Section 3. As in the multivariate case, the TLS method consists in a modification of a (penalized) least squares estimator of αα for non-noisy observations: see Marx and Eilers (1999) and Cardot et al. (2003) for such kind of estimators based on B-splines with two different penalties. Here, we introduce another estimator based on smoothing splines which, as far as we know, has not been studied previously in the literature. Some convergence results are also given in Section 2 (in the non-noisy case) which serve as a basis for convergence results of the TLS estimator given in Section 3. A more detailed study of the asymptotic behavior of the smoothing splines estimator will be the subject of a forthcoming work. In Section 4, the results of convergence for the TLS estimator are commented. Section 5 is devoted to some numerical simulations presenting an evaluation of our estimation procedure. Finally, in Section 6, we give the proof of our results.