دانلود مقاله ISI انگلیسی شماره 24596
ترجمه فارسی عنوان مقاله

آزمون فرضیه در رگرسیون خطی هنگامی که K/n بزرگ است

عنوان انگلیسی
Hypothesis testing in linear regression when k/n is large
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
24596 2011 12 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Journal of Econometrics, Volume 165, Issue 2, December 2011, Pages 163–174

ترجمه کلمات کلیدی
تقریب ابعادآزمون - حداقل مربعات معمولی
کلمات کلیدی انگلیسی
Dimension asymptotics, FF-test, Ordinary least squares,
پیش نمایش مقاله
پیش نمایش مقاله  آزمون فرضیه در رگرسیون خطی هنگامی که K/n بزرگ است

چکیده انگلیسی

This paper derives the asymptotic distribution of the FF-test for the significance of linear regression coefficients as both the number of regressors, kk, and the number of observations, nn, increase together so that their ratio remains positive in the limit. The conventional critical values for this test statistic are too small, and the standard version of the FF-test is invalid under this asymptotic theory. This paper provides a correction to the FF statistic that gives correctly-sized tests both under this paper’s limit theory and also under conventional asymptotic theory that keeps kk finite. This paper also presents simulations that indicate the new statistic can perform better in small samples than the conventional test. The statistic is then used to reexamine Olivei and Tenreyro’s results from [Olivei, G., Tenreyro, S., 2007. The timing of monetary policy shocks. The American Economic Review 97, 636–663] and Sala-i-Martin’s results from [Sala-i-Martin, X.X., 1997. I just ran two million regressions. The American Economic Review 87 (2), 178–183].

مقدمه انگلیسی

with xtxt and εtεt uncorrelated. Under standard assumptions, the OLS estimator, View the MathML sourceβˆ, is consistent and asymptotically normal as nn increases to infinity. This asymptotic distribution is the basis for most of the empirical research in economics, but as Huber (1973) has shown, it is unreliable unless k/nk/n is close to zero; kk is the number of regressors in the model. Huber proves that the OLS coefficient estimator is consistent and asymptotically normal when kk increases with nn, but only if k/n→0k/n→0. In practice, k/nk/n will always be positive and is sometimes large, so it is unclear whether the classic tests that exploit asymptotic normality are themselves reliable. This paper derives the asymptotic distribution of the FF-test for arbitrary linear hypotheses about these coefficients under a more general limit theory that allows k/nk/n to remain uniformly positive. The conventional FF-test is asymptotically invalid under this limit theory, but despite this theoretical tendency to over-reject, will usually have close to its nominal size in practice.1 Moreover, this paper derives a modification of the FF-test that is asymptotically valid and demonstrates that this new test performs better than the unmodified FF-test in practice. This paper is not the first to study the asymptotic distribution of estimators like View the MathML sourceβˆ as both nn and kk increase. Previous research has looked at the behavior of MM-estimators as kk increases, the behavior of Analysis of Variance (ANOVA) as the number of groups increases, and the behavior of instrumental variable estimators as the number of instruments increases. This research has followed two distinct paths. The first looks for the fastest growth rate of kk that is compatible with standard consistency and asymptotic normality results; k=o(n)k=o(n) is necessary for these results to hold but is often insufficient. The second approach looks for alternative asymptotic distributions of the coefficient estimators keeping k/nk/n positive. These increasing-kk asymptotics were first introduced in the context of MM-estimation; Huber (1973) argues that assuming kk is fixed is unrealistic in practice. After proving that k=o(n)k=o(n) is necessary for the OLS estimator to be consistent and asymptotically normal, Huber argues that this condition is likely to be needed by any tractable asymptotic theory and proves normality of the MM-estimator of the coefficients of the linear regression model under the stronger condition that k3/n→0k3/n→0. This rate was improved by Yohai and Maronna (1979) and Portnoy, 1984 and Portnoy, 1985 to klogk/n→0klogk/n→0 for consistency and (klogk)1.5/n→0(klogk)1.5/n→0 for asymptotic normality. Further research has extended these results to other estimating functions (Welsh, 1989), nonlinear models (He and Shao, 2000), and estimation of the distribution of the errors (Chen and Lockhart, 2001, Mammen, 1996 and Portnoy, 1986). In econometrics, interest has focused instead on the properties of IV estimators with a fixed number of coefficients but an increasing number of instruments, ll. Bekker (1994), building on earlier results by Anderson (1976), Kunitomo (1980), and Morimune (1983), studies the asymptotic behavior of Two-Stage Least Squares (2SLS) and variations of Limited Information Maximum Likelihood (LIML) in models with normal errors as l/nl/n converges to a positive constant. These authors find that LIML is both consistent and asymptotically normal but that 2SLS is not. These results are extended to non-Gaussian errors by Hansen et al. (2008), Chao et al. (2008), and others. Koenker and Machado (1999) prove the consistency and asymptotic normality of GMM estimators with l3/n→0l3/n→0. Stock and Yogo (2005), Chao and Swanson (2005), and Andrews and Stock (2007), among others, combine the many-instruments and the weak instrument literature and argue that the relationship between the concentration parameter and ll is more important than that between the number of observations and ll. Anderson et al. (2010) establish some optimality properties for LIML in this setting. Han and Phillips (2006) study the limiting distributions of nonlinear GMM estimators with many weak instruments, and their approach allows for the estimators to converge to non-normal distributions. Previous work on the FF-test under increasing-kk asymptotics has focused largely on ANOVA. Boos and Brownie (1995) find that the usual FF-test is asymptotically invalid unless the design matrix is perfectly balanced (requiring an equal number of observations for each group) and propose a new Gaussian approximation for the statistic that gives an asymptotically valid test. This result is extended to two-way fixed-effects and mixed models (Akritas and Arnold, 2000), to allow for heteroskedasticity (Akritas and Papadatos, 2004, Bathke, 2004 and Wang and Akritas, 2006), and to allow for additional covariates (Orme and Yamagata, 2006 and Orme and Yamagata, 2007). See, for example, Fujikoshi et al. (2010) for many asymptotic results related to this literature. Anatolyev (forthcoming) studies the asymptotic performance of the Likelihood Ratio, LMLM, and FF-tests under these asymptotics, imposing a different condition on the regressor matrix that rules out the unbalanced ANOVA applications just mentioned. Anatolyev shows that these three statistics behave differently; the LMLM and LRLR tests require a correction, but the FF-test does not. We focus on the FF-test alone in this paper, and find, consistent with the ANOVA literature, that it too requires a correction when the regressor matrix does not satisfy Anatolyev’s conditions. This research suggests that the standard test should behave poorly in finite samples unless the number of predictors is quite small. However, the FF-test is known to have extremely good performance as a comparison of means, even when the errors are not normal. Scheffé (1959), for example, presents analytic and computational evidence that supports using the FF-test even with asymmetric and fat tailed errors. Moreover, the simulations presented in some of the ANOVA papers themselves support using the naive FF statistic instead of their proposed replacements. Akritas and Papadatos (2004), for example, simulate a 5% test with lognormal errors and find that the conventional FF-test has size 0.04, while their proposed statistics have size 0.74 and 0.60, a moderate over-rejection. These corrections have other undesirable features. The approximations do not hold under conventional, fixed-kk asymptotics, forcing applied researchers to choose between two incompatible asymptotic approximations. Since k/nk/n is always positive in practice, it is logical to use increasing-kk limit theory by default, but the simulation evidence indicates that it performs poorly. Moreover, existing results only apply under strong restrictions on the matrix of regressors–assuming either an ANOVA structure or other inhibitive conditions–and so are not relevant for applied economic research. This paper instead proposes a simple correction to the usual FF statistic that gives a valid test under either conventional fixed-kk or increasing-kk asymptotics. When kk is fixed, the correction disappears in the limit and our proposed statistic is asymptotically equivalent to the FF-test. When k/nk/n remains positive, the correction does not vanish and improves the size of the test statistic. The simulations presented in this paper indicate that this new statistic performs better than the conventional FF-test and also outperforms a Gaussian test that is similar to those proposed in the ANOVA literature. Since this statistic nests both the standard and nonstandard asymptotics, careful study of the correction can explain the FF-test’s strong performance in simulations. The magnitude of the correction depends on the excess kurtosis of the regression errors, εtεt, and on a particular feature of the design matrix of regressors. When the excess kurtosis is zero, no correction is necessary and the FF-test is valid. If the excess kurtosis is not zero, the magnitude of the correction depends on the diagonal elements of the projection matrices for the unrestricted and restricted models — the restricted model is the model estimated under the null hypothesis. In practice, it is likely that the correction will be quite small and the naive FF-test will perform reassuringly well, even if it is invalid. When the FF statistic returns a value near the critical value for a specific test size, though, the correction can affect whether the test rejects or fails to reject the null hypothesis. Finally, the use of this statistic is demonstrated through two applications — one for time series macroeconomic data and one for cross-sectional data. The first reexamines Olivei and Tenreyro’s (2007) study, “The Timing of Monetary Policy Shocks,” and finds further support for their conclusion that the effect of monetary policy on output has seasonal variation. The second reexamines Sala-i-Martin’s (1997) cross-country economic growth analysis and finds supporting evidence that additional variables beyond primary school education, GDP per capita, and life expectancy are correlated with a country’s economic growth. These variables were singled out by Levine and Renelt (1992) and Sala-i-Martin (1997) as widely supported determinants of economic growth. The first example uses 144 observations to test 51 restrictions; the setup is a VAR with four equations and there are 51 restrictions on each of these equations. The second example uses 88 observations and tests 64 restrictions. To reiterate, this paper derives a new statistic that can replace the FF statistic in tests and works well for regression models with many regressors. The paper also explains the original FF-test’s strong performance in simulations and illustrates where it is likely to do poorly in applications. Section 2 discusses the new test statistic and studies its asymptotic distributions under the null and alternative hypotheses. Section 3 presents Monte Carlo evidence in favor of the statistic. Section 4 presents the empirical exercises. Section 5 concludes. The proofs are presented in the Appendix.

نتیجه گیری انگلیسی

Often researchers are concerned that using too large a model will bias their results — that they will find spurious and nonexistent patterns in a dataset simply because the model has many unknown parameters. This paper shows that the naive FF-test has a tendency to over-reject for models with many parameters. However, this tendency can be understood and modeled, and this paper derives a new statistic that controls for model size and yields a valid test for regression models with many coefficients. Our theory suggests that this correction is especially important when the number of restrictions being tested is large, when the regressors are fat-tailed, and when the regression errors have high excess kurtosis — when those conditions are not met, both the original FF-test and our corrected version are reliable. This paper’s Monte Carlo evidence suggests that the FF-test can over-reject in finite samples, and our empirical exercises demonstrate that the FF-test and our new statistic may give different answers in practice when the original FF statistic is near the test’s critical values. This paper also shows that the Wald test can be unreliable when the regression model is large and should be avoided when possible. The asymptotic theory underlying this new statistic builds on and extends similar results for the FF-test in the ANOVA literature. The statistic that we present has several advantages over the ANOVA test statistics, the most important of which is its proximity to the FF-test in situations where the FF-test performs well. In that light, we also suggest that the statistic View the MathML sourceGˆ also be used for homoskedastic ANOVA when the number of groups is large and the number of observations per group is small.