رگرسیون خطی محلی مبتنی بر فاصله برای پیش بینی عملکرد
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
24297 | 2010 | 9 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, Volume 54, Issue 2, 1 February 2010, Pages 429–437
چکیده انگلیسی
The problem of nonparametrically predicting a scalar response variable from a functional predictor is considered. A sample of pairs (functional predictor and response) is observed. When predicting the response for a new functional predictor value, a semi-metric is used to compute the distances between the new and the previously observed functional predictors. Then each pair in the original sample is weighted according to a decreasing function of these distances. A Weighted (Linear) Distance-Based Regression is fitted, where the weights are as above and the distances are given by a possibly different semi-metric. This approach can be extended to nonparametric predictions from other kinds of explanatory variables (e.g., data of mixed type) in a natural way.
مقدمه انگلیسی
Observing and saving complete functions as results of random experiments are nowadays possible by the development of real-time measurement instruments and data storage resources. For instance, continuous-time clinical monitoring is a common practice today. Functional Data Analysis (FDA) deals with the statistical description and modelization of samples of random functions. Functional versions for a wide range of statistical tools (ranging from exploratory and descriptive data analysis to linear models to multivariate techniques) have been recently developed. See Ramsay and Silverman (2005) for a general perspective on FDA and Ferraty and Vieu (2006) for a nonparametric approach. Special monographic issues recently dedicated to this topic by several journals (Davidian et al., 2004, González-Manteiga and Vieu, 2007 and Valderrama, 2007) bear witness to the interest on this topic in the Statistics community. Other recent papers on FDA are Park et al. (2009), Ferraty and Vieu (2009), Aguilera et al. (2008) and Zheng (2008). In this paper we consider the problem of predicting a scalar response using a functional predictor. Let us give an example: Spectrometric Data are described in Chapter 2 of Ferraty and Vieu (2006). This dataset includes information about 215 samples of chopped meat. For each of them, the function χχ, relating absorbance versus wavelength, has been recorded for 100 values of wavelength in the range 850–1050 nm. An additional response variable is observed: yy, the sample fat content obtained by analytical chemical processing. Given that obtaining a spectrometric curve is less expensive than determining the fat content by chemical analysis, it is important to predict the fat content yy from the spectrometric curve χχ. In Section 4 the Spectrometric Data are used to illustrate the methods we propose in this work, jointly with another example on air pollution. In technical terms, the problem is stated as follows: Let View the MathML source(χ,Y) be a random element where the first component View the MathML sourceχ is a random element of a functional space (typically a real function View the MathML sourceχ from [a,b]⊆R[a,b]⊆R to RR) and YY is a real random variable. We consider the problem of predicting the scalar response variable yy from the functional predictor χχ. We assume that we are given nn i.i.d. observations (χi,yi),i=1,…,n(χi,yi),i=1,…,n, from View the MathML source(χ,Y) as a training set. Let View the MathML sourcem(χ)=E(Y|χ=χ) be the regression function. Then an estimate of m(χ)m(χ) is a good prediction of yy. The linear functional regression model, considered in Ramsay and Silverman (2005), assumes that View the MathML sourcem(χ)=α+∫abχ(t)β(t)dt,andyi=m(χi)+εi, Turn MathJax on εiεi having zero expectation. The parameter ββ is a function and α∈Rα∈R. These authors propose to estimate ββ and αα by penalized least squares: View the MathML sourceminα,β∑i=1n(yi−α−∫abχi(t)β(t)dt)2+λ∫ab(L(β)(t))2dt, Turn MathJax on where L(β)L(β) is a linear differential operator giving a penalty to avoid too much rough ββ functions and λ>0λ>0 acts as a smoothing parameter. Ferraty and Vieu (2006) consider this linear regression as a parametric model because only a finite number of functional elements is required to describe it (in this case only one is needed: ββ). They consider a nonparametric functional regression model where few regularity assumptions are made on the regression function m(χ)m(χ). They propose the following kernel estimator for m(χ)m(χ): View the MathML sourcemˆK(χ)=∑i=1nK(δ(χ,χi)/h)yi∑i=1nK(δ(χ,χi)/h)=∑i=1nwi(χ)yi, Turn MathJax on where View the MathML sourcewi(χ)=K(δ(χ,χi)/h)/∑j=1nK(δ(χ,χj)/h), KK is a kernel function with support [0,1][0,1], the bandwidth hh is the smoothing parameter (depending on nn), and δ(⋅,⋅)δ(⋅,⋅) is a semi-metric (δ(χ,χ)=0δ(χ,χ)=0, δ(χ,γ)=δ(γ,χ)δ(χ,γ)=δ(γ,χ), δ(χ,γ)≤δ(χ,ψ)+δ(ψ,γ)δ(χ,γ)≤δ(χ,ψ)+δ(ψ,γ)) in the functional space F={χ:[a,b]→R}F={χ:[a,b]→R} to which the data χiχi belong. Examples of semi-metrics in FF are L2L2 distances between derivatives, View the MathML sourcedrderiv(χ,γ)=(∫ab(χ(r)(t)−γ(r)(t))2dt)1/2; Turn MathJax on and the L2L2 distance in the space of the first qq functional principal components of the functional dataset χi,i=1,…,nχi,i=1,…,n: View the MathML sourcedqPCA(χ,γ)=(∑k=1q(ψkχ−ψkγ)2)1/2, where View the MathML sourceψkχ is the score of the function χχ in the kkth principal component. See Chapters 8 and 9 in Ramsay and Silverman (2005) or Chapter 3 in Ferraty and Vieu (2006) for more information about functional principal component analysis. In Ferraty and Vieu (2006) it is proved that View the MathML sourcemˆK(χ) is a consistent estimator (in the sense of almost complete convergence) of m(χ)m(χ) under regularity conditions on mm, View the MathML sourceχ (involving small balls probability), YY and KK. Moreover, Ferraty et al. (2007) prove the mean square convergence and find the asymptotic distribution of View the MathML sourcemˆK(χ). The book of Ferraty and Vieu (2006) lists several interesting open problems concerning nonparametric functional regression. In particular, their Open Question 5 addresses the transfer of local polynomial regression ideas to an infinite dimensional setting in order to extend the estimator View the MathML sourcemˆK(χ), that is a kind of Nadaraya–Watson regression estimator. A first answer to this question is given in Baíllo and Grané (2009). They propose a natural extension of the finite dimensional local linear regression, by solving the problem View the MathML sourceminα,β∑i=1nwi(χ)(yi−α−∫ab(χi(t)−χ(t))β(t)dt)2, Turn MathJax on where local weights View the MathML sourcewi(χ)=K(‖χ−χi‖/h)/∑j=1nK(‖χ−χj‖/h) are defined by means of L2L2 distances (View the MathML source‖χ‖2=∫abχ2(t)dt; it is assumed that all the functions are in L2([a,b])L2([a,b])). Their estimator of m(χ)m(χ) is View the MathML sourcemˆLL(χ)=αˆ. Closely related approaches can be seen in Berlinet et al. (2007) and Barrientos-Marin (2007). In this work we give an alternative response to the same open question. Our proposal rests on Distance-Based Regression (DBR), a prediction tool based on inter-individual distances including both Ordinary and Weighted Least Squares (OLS, WLS) as particular cases. Section 2 presents the needed formulas. In Section 3 we introduce our proposal, Local Linear Distance-Based Regression and in Section 4 we apply it to studying two datasets: the Spectrometric Data mentioned above and another one arising from air pollution measures. Section 5 contains some concluding remarks.
نتیجه گیری انگلیسی
In the problem of regression with functional predictor and scalar response we have presented the local linear DBR estimator of m(χ)m(χ), a nonparametric method based on DBR. This method is very flexible, including as a particular case the local polynomial regression for real predictor variables. Moreover it gives good results in practice. So we consider that this proposal is a satisfactory answer to Open question 5 in Ferraty and Vieu (2006).