پارامترهای محدود کننده در یک مدل رگرسیون خطی با یک قهقرایی اندازه گیری نادرست با استفاده از اطلاعات اضافی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|24203||2006||20 صفحه PDF||سفارش دهید||8289 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Journal of Econometrics, Volume 133, Issue 1, July 2006, Pages 51–70
This paper discusses a linear regression model with a mismeasured regressor in which the measurement error is correlated with both the latent variable and the regression error. We use a linear structure to capture the correlation between the measurement error and the latent variable. This paper shows that the variance of the latent variable is very useful for revealing information on the parameters which otherwise cannot be obtained with such a nonclassical measurement error. The main result is that the finite bounds on the parameters can be found using the variance of the latent variable, regardless of how severely the measurement error and the regression error are correlated, if the mismeasured regressor contains enough information on the latent one. This paper also discusses the special but interesting case of the latent variable being dichotomous. In this case, the mean of the latent variable may even reveal information on the correlation between the measurement error and the regression error. All the bounds developed in the paper are tight.
The measurement error model has increasingly been a topic of interest among researchers who want to estimate economic parameters such as the return to schooling and the union wage differential. When a regressor is mismeasured in a linear regression model, the least-squares estimator is generally not consistent, but at least some information can be inferred about the true parameters from the inconsistent estimators. These types of results are in the form of bounds on the parameters, which will hold asymptotically. Under the classical assumption that the measurement error is independent of the latent regressor and the regression error, it is well known that the regressions of x on y and y on x provide asymptotic bounds on the coefficient on x in the one-regressor case ( Gini, 1921). However, the problem is more complicated in a multi-regressor context, and the existence of bounds is limited to certain cases. The classical result in the area is due to Koopmans (1937), who shows that such a generalization is possible only under very restrictive conditions. Patefield (1981) and Klepper and Leamer (1984) present a similar result. When further information on the measurement error distribution, such as bounds on the error variance, is available, narrower bounds on the parameters can be found ( Bekker et al., 1984). Similar types of bounds are also discussed in Leamer, 1982 and Leamer, 1987 and Klepper (1988b). While the classical measurement error has been studied intensively, nonclassical measurement error has drawn more and more attention from researchers in recent decades. Bekker et al. (1987) discuss the case of errors in regressors and the regression error being correlated. Iwata (1992) and Krasker and Pratt, 1986 and Krasker and Pratt, 1987 show that bounds on these correlations may help find bounds on parameters of interest. Erickson (1993) provides a neat result when the measurement error is independent of the latent regressor but correlated with the regression error. As for empirical evidence of the nonclassical measurement error, Rodgers et al. (1993) suggest that the measurement error may be correlated with the latent variable. Bound et al. (2001) also find that the assumption that the measurement error is independent of the latent variable is strong and often implausible. This paper discusses a linear measurement error model in which the measurement error is correlated with both the latent variable and the regression error. Let y denote the dependent variable, x*x* denote the latent regressor and ww denote the row vector of the other regressors (excluding the constant). Let α,βα,β and γγ be the intercept, the regression coefficients of x*x* and ww respectively, where γγ is a column vector with the same dimension as w.w. Let uu stand for the regression error. The linear regression model is as follows: equation(1) y=α+βx*+wγ+uy=α+βx*+wγ+u Turn MathJax on with E(u|x*,w)=0E(u|x*,w)=0. The researcher observes another variable x together with y and ww as the proxy of the latent variable x*x*. A critical assumption in this paper is that the conditional mean of the measurement error v=x-x*v=x-x* is linear in the latent regressor x*x*. Then, equation(2) x=p+rx*+ɛ,x=p+rx*+ɛ, Turn MathJax on where E(ɛ|x*,w)=0E(ɛ|x*,w)=0. Eq. (2) implies that the measurement error vv may be correlated with the latent variable x*x*, and that the observed variable x may also contain a systematic shift p. The linear structure in Eq. (2) can be justified as follows: first, if vv and x*x* are joint-normally distributed and E(x|x*,w)=E(x|x*),E(x|x*,w)=E(x|x*), the conditional mean of vv on x*x* is a linear function of x*x*. Second, when x and x*x* are two 0–1 dichotomous variables, x and x*x* also satisfy Eq. (2). The linear structure in Eq. (2) allows the correlation between the measurement error and the latent variable. Such a correlation has received increasing attention in the literature, especially in studies relating to earnings and wages. For example, Angrist and Krueger (1999) compare the self-reported hourly wage in CPS with corresponding employers’ records, and find that the variance of the log self-reported wage is 0.355 while that of the employer-reported wage is 0.430. The fact that the latter is larger than the former implies that the measurement error vv must be correlated with the true value x*x* if we assume employers’ records are accurate. This is because the variance of the self-reported wage σxxσxx would be larger than that of the employer-reported wage σx*x*σx*x* if the measurement error vv were uncorrelated with the true wage x*x*. Eq. (2) implies that the conditional mean of the measurement error vv is linear in x*x*, i.e., E(v|x*,w)=p+(r-1)x*E(v|x*,w)=p+(r-1)x* and that σxx⩾r2σx*x*.σxx⩾r2σx*x*. Therefore, we have r⩽0.91r⩽0.91 if we assume r>0r>0. The fact that r<1r<1 means that the measurement error in the self-reported wage is negatively correlated with the true wage. This is also consistent with the existing findings, such as those in Rodgers et al. (1993). The method in Erickson (1993) is not applicable to this framework because the latent regressor in this paper is correlated with its measurement error. It has been shown that no informative bounds on the parameters of interest exist when measurement error is correlated with both the latent regressor and the error of the regression (Krasker and Pratt, 1986, Bekker et al., 1987 and Erickson, 1989). Therefore, additional information is needed to find the bounds on the parameters of interest. Since we may observe the latent variable from other sources, the additional information may be the variance of the latent variable. In other words, the researcher may observe yy, x and ww in one data set and x*x* in another data set. This framework is reasonable for several applications. For example, wages are usually mismeasured in the survey data, while the administrative data may contain accurately measured wages. Other useful additional information may include the bounds on the parameter rr. It is plausible to assume the parameter r is bounded away from zero if x contains enough information on x*x*. We may then assume there exists an m such that r⩾m>0.r⩾m>0. One can show View the MathML sourcer=ρxx*σxx/σx*x* where ρxx*ρxx* is the correlation coefficient between x and x*x*. Since σxxσxx and σx*x*σx*x* are identified, a lower bound on ρxx*ρxx* implies a lower bound on r . When x and x*x* are two 0–1 dichotomous variables, the lower bound on r implies an upper bound on the total misclassification probability. We will show that this information is very useful for finding informative bounds on the parameters of interest. The paper is organized as follows: Section 2 derives the bounds for a single regressor linear model. Section 3 provides the main results of the paper. A linear model with a dichotomous latent regressor is discussed in Section 4 as an application. Section 5 concludes the paper. The appendix includes all the proofs.
نتیجه گیری انگلیسی
This paper discusses a linear regression model with a mismeasured regressor under the assumption that the variance of the latent regressor is available. The main result is that the parameters of interest can be finitely bounded with additional information, the variance of the latent variable and an additional lower bound on the parameter rr, regardless of how severely the measurement error is correlated with the regression error. If the regression error and the measurement error are uncorrelated, the variance of the latent regressor helps provide narrower bounds compared with those in the existing results. We also discuss the model with a latent dichotomous regressor as an application of the general result. In this case, the additional information needed includes the mean of the latent variable, and an upper bound on the total misclassification probability. The additional information may lead to bounds not only on the parameters of interest, but also on the correlation coefficient between the measurement error and the regression error. The presented results suggest that the variance of the latent variable is very useful in solving the nonclassical measurement error problem in the linear regression model.