برآورد در رگرسیون خطی را با خطاهای اندازه گیری در معرض تحریف تک نمایه
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
24637 | 2013 | 18 صفحه PDF |

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, Volume 59, March 2013, Pages 103–120
چکیده انگلیسی
In this paper, we consider statistical inference for linear regression models when neither the response nor the predictors can be directly observed, but are measured with errors in a multiplicative fashion and distorted as single index models of observable confounding variables. We propose a semiparametric profile least squares estimation procedure to estimate the single index. Then we develop a global weighted least squares estimation procedure for parameters of linear regression models via the varying coefficient models. Asymptotic properties of the proposed estimators are established. The results combined with consistent estimators for the asymptotic variance can be employed to test whether the targeted parameters in the single index and linear regression models are significant. Finite-sample performance of the proposed estimators is assessed by simulation experiments. The proposed methods are also applied to a dataset from a Pima Indian diabetes data study.
مقدمه انگلیسی
In many applications, variables may not be directly observed because of certain contamination. This type data are common in many disciplines, such as in health science and medicine research. As we know, the measurement error in covariates may cause large bias, sometimes seriously, in the estimated regression coefficient if we ignore the measurement error. The goal of measurement error modeling is to correct such bias, attainment of this goal requires considerable care. As such, the measurement error models have been widely studied and received great attention in the literature. Fuller (1987) is a comprehensive survey containing many linear measurement error models. Carroll et al. (2006) systematically summarized the recent research developments of nonlinear and semiparametric measurement error models. In this paper, we consider the problem of estimating a (p+1)(p+1)-vector of parameters View the MathML sourceβ0 from the linear regression models equation(1) View the MathML sourceY=Xτβ0+ε, Turn MathJax on where “ττ” denotes the transport operation throughout this paper, YY is an univariate response, View the MathML sourceX=(X0,X1,…,Xp)τ is a predictor vector with X0≡1X0≡1 for the intercept. View the MathML sourceβ0=(β00,β01,…,β0p)τ is an unknown p+1p+1 dimensional vector parameter in Rp+1Rp+1, and εε is the model error satisfying E(ε)=0E(ε)=0 and E(ε2)<∞E(ε2)<∞. Our interest in this paper is to estimate View the MathML sourceβ0 when both the response and predictors are observed with measurement errors by certain multiplicative distorting functions. Specially: equation(2) View the MathML sourceỸ=ϕ(θ0τU)Y,X̃1=ψ1(θ0τU)X1,…,X̃p=ψp(θ0τU)Xp, Turn MathJax on equation(3) View the MathML sourceE{ϕ(θ0τU)}=1,E{ψ1(θ0τU)}=1,…,E{ψp(θ0τU)}=1, Turn MathJax on where View the MathML source(Y,X1,…,Xp)⫫U,⫫ indicates independence. ϕ(⋅),ψr(⋅)ϕ(⋅),ψr(⋅) are unknown continuous distorting functions. View the MathML sourceU is an observed continuous confounding variable. θ0=(θ01,…,θ0q)τθ0=(θ01,…,θ0q)τ is a qq-vector of parameters in RqRq satisfying ‖θ0‖=1‖θ0‖=1, where ‖⋅‖‖⋅‖ stands for the Euclidean norm. The constraint ‖θ0‖=1‖θ0‖=1 is used to identify single index θ0θ0 because ϕ(⋅),ψr(⋅)ϕ(⋅),ψr(⋅) are all unknown and only the orientation of θ0θ0 is identifiable (Zhu and Xue, 2006). Conditions (3) are the identifiability conditions on View the MathML sourceϕ(θ0τU),ψr(θ0τU) suggested by Nguyen and Şentürk (2008). The identifiability conditions ensure that the distorting effect vanishes with no average distortion, namely, View the MathML sourceE(Ỹ)=E(Y),E(X̃r)=E(Xr). The above scenario is common in practice due to the distortion from the effects of the confounding variable. For example, Kaysen et al. (2002) collected data on hemodialysis patients involved in medical studies and realized that the fibrinogen level and serum transferrin level should be divided by the body mass index (BMI). This adjustment by division implies a multiplicative fashion of the relationship between the unobserved primary variables and the confounding variable. Nevertheless, the precise knowledge of the confounding variable and the primary variables is hardly known in practice. The naive division of the confounding variable to estimate original response YY and predictors XrXr’s may cause large bias or lead to model misspecification. Thus, Şentürk and Müller (2005) introduced a more flexible model, namely, covariate adjusted regression (CAR), in which the unknown continuous distortion functions ϕ(⋅),ψr(⋅)ϕ(⋅),ψr(⋅) are allowed from a practical point of view. In many applications, more than one confounding variable may simultaneously affect the primary variables of interest. One example is the Pima Indian diabetes data. In this dataset, Nguyen and Şentürk (2008) found that body mass index (BMI) and triceps skin fold thickness (SFT) are two potential distorting covariates to affect plasma glucose concentration (GLU) and diastolic blood pressure (DBP). To examine the underlying relationship between GLU and DBP, they considered the single index distortions (2) and modeled GLU and DBP as a linear regression model. To eliminate the effect caused by distortions, Şentürk and Müller, 2005, Şentürk and Müller, 2006 and Şentürk and Müller, 2009 transformed the distorted response and distorted predictors via a connection to a varying coefficient regression and apply a binning method similar to that proposed by Fan and Zhang (2000) for longitudinal data. This binning method is designed for the models with linear structure, such as linear regression models (Şentürk and Müller, 2005 and Şentürk and Müller, 2006), generalized linear models Şentürk and Müller (2009) and partial linear single index models (Zhang et al., 2012a). As for the nonlinear models, the transformation technique to the varying coefficient models may not work well and may lead to the non-identifiability of some parameters (Cui et al., 2009). As a remedy, Cui et al. (2009) proposed a direct plug-in method by using the calibrated arguments View the MathML sourceYˆ=Ỹ/ϕˆ,Xˆr=X̃r/ψˆr, here View the MathML sourceϕˆ and View the MathML sourceψˆr are the traditional nonparametric kernel smoothing estimators. Any further estimation is then based on the calibrated quantities. The primary goal is to estimate View the MathML sourceβ0 in the linear regression models (1) and uncover the true relationship between YY and View the MathML sourceX. To achieve this goal, Nguyen and Şentürk (2008) extends the transformation method to an adaptive varying single index coefficient model (Xia and Li, 1999). Nguyen and Şentürk (2008) used a hybrid backfitting algorithm to simultaneously estimate the unknown single index and varying coefficient functions. The final estimator of View the MathML sourceβ0 is a weighted-average of the estimated coefficient functions. However, Nguyen and Şentürk (2008) did not provide theoretical justification for their approach. Another issue is about the hybrid backfitting algorithm they adopted. This algorithm needs to take derivatives with respect to the single index θ0θ0 when updated it (see p. 818 of Nguyen and Şentürk, 2008). Nevertheless, as noted in Zhu and Xue (2006) and Zhu et al. (2010), the restriction of ‖θ0‖=1‖θ0‖=1 leads to a non-differential problem at the point θ0θ0 lying on the boundary of a unit ball. What should be right hybrid backfitting algorithm for estimating the single index θ0θ0 under this situation is much less well understood. In this paper, we propose a different estimation proposal in the multivariate covariate adjusted setting. We use the popular “delete-one-component” method to overcome the non-differential difficulty and propose a semiparametric profile least squares method to estimate the single index θ0θ0. Next, we establish a connection to varying coefficient models. Unlike the binning method used in Şentürk and Müller, 2005, Şentürk and Müller, 2006 and Şentürk and Müller, 2009, a global weighted least squares method is adopted to estimate these varying coefficient functions, and an estimator of View the MathML sourceβ0 can be constructed by using these estimated varying coefficient functions. The asymptotic normality of the parameter estimators which we are interested in is also obtained. Furthermore, we propose consistent estimators of asymptotic variance to construct a test statistic for testing whether the targeted parameters View the MathML sourceθ0,β0 are significant. A simulation study is conducted to examine the performance of the proposed procedures with moderate sample sizes. In this simulation, we also compare our method with some existing methods, such as the binning method (Şentürk and Müller, 2005 and Şentürk and Müller, 2006), direct plug-in method (Cui et al., 2009) and dimension reduction based method (Zhang et al., 2012b). A re-visit to the Pima Indian diabetes data shows a more reasonable explanation as compared to Nguyen and Şentürk (2008). The remainder of the paper is organized as follows. In Section 2, we propose the semiparametric profile least squares estimation procedure for the single index θ0θ0, and further introduce a global weighted least squares estimation procedure for the parameters View the MathML sourceβ0. The asymptotic properties of the proposed estimators are investigated in this section. The estimators of asymptotic variance are also given in Section 2. In Section 3, we report the results of a simulation study. In Section 4, we present the results of our statistical analysis of a Pima Indian diabetes data study. All the technical proofs of the asymptotic results are given in the Appendix.