آزمون مدل رگرسیون خطی با AR (1) اشتباهات در برابر یک مدل رگرسیون خطی درجه اول پویا با اشتباهات نویز سفید: روش نقطه تست مطلوب
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|24647||2013||11 صفحه PDF||سفارش دهید||10640 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Economic Modelling, Volume 33, July 2013, Pages 126–136
We know very little about the performance of point optimal (PO) and approximate point optimal (APO) tests in the presence of unavoidable nuisance parameters. Because marginal likelihood based tests are said to perform well in the presence of unavoidable nuisance parameters, this paper compares the performance of marginal likelihood based APO tests and classical tests using a testing problem which has been largely overlooked by econometric practitioners, namely testing for a static linear regression model with AR(1) errors against a dynamic linear regression model with white noise errors. It is well known that the classical tests are specifically designed for nested testing, they are applied to test for the significance of the dynamic coefficient of a dynamic linear regression model with AR(1) errors. A testing procedure is proposed, where the size and power comparisons used are based on near-exact non-similar critical values of tests obtained using the simulated annealing (SA) algorithm, as the near-exact non-similar critical values control the sizes of the tests well overall. Among marginal likelihood based classical tests, the likelihood ratio (LR) test and Lagrange multiplier (LM) test seem to perform well under the alternative hypothesis, particularly when the dynamic parameter is large and the sample size is reasonably big. The Wald (W) test is the worst performer overall. This concurs with previous observations that the W test performs poorly in small samples. Compared to the classical approach, APO tests appear to have good power properties, particularly in the neighborhood of the chosen parameter point under the alternative hypothesis. This finding may advance the use of PO and APO tests.
In the context of a linear regression model, researchers have often tried to correct for autocorrelation by including a first-order autoregressive (AR(1)) or first-order moving average (MA(1)) error term, when the Durbin and Watson, 1950 and Durbin and Watson, 1951 (DW) statistic for autocorrelation is significant. They have then estimated the model with either generalized least squares (GLS) or the maximum likelihood (ML) method. However, a number of researchers have warned that if the true model is dynamic, a substantial bias in estimation can occur, and vice-versa (De Boef and Keele, 2008, Keele and Kelly, 2006 and McGuirk and Spanos, 2002). As McGuirk and Spanos (2002) and Spanos, 1998 and Spanos, 1999 point out, rejecting the null hypothesis in a misspecification test such as the DW test does not indicate that the alternative model is true; only that the original model may be misspecified. That is, in a linear regression model context, a significant DW test indicates that the data possess some kind of dependence. The type of dependence present in the data can be captured with several alternative statistical models which allow for such dependence. One such alternative is the dynamic linear regression model. An advantage of this model is that it (almost) nests the autocorrelation corrected linear model (McGuirk and Spanos, 2002). King and Rankin (1993) find that when the DW test in a linear model is significant, and the true model is the dynamic linear model with a large autoregressive parameter, a substantial loss in accuracy of prediction can occur, if one proceeds to correct for AR(1) disturbances without checking for the possibility of a dynamic linear model with white noise errors. In particular they show that when the true model is the dynamic linear model, correcting for serial correlation (or vice-versa) can result in substantially biased estimates. Similar findings are reported by Keele and Kelly (2006) and McGuirk and Spanos (2002). The AR(1) model versus first-order dynamic linear model are widely used in political science (Beck and Katz, 2009 and Kono, 2006). The dynamic model is often used in this discipline to remove autocorrelation (Blaydes, 2011) and is often considered theoretically more appropriate because past behavior/performance is believed to influence current political decision making. Arbitrarily using the dynamic linear model when the true model is AR(1) (or vice-versa) may produce biased estimates, particularly in finite samples. While not used as extensively in econometrics, a number of researchers have indicated that there are different ways to model the data when significant autocorrelation is detected (Heaney and Sriananthakumar, 2012, Jones, 1993a, Jones, 1993b and Sriananthakumar and King, 2006). Jones, 1993a and Jones, 1993b1 uses US fuel consumption and passenger vehicle use data to conduct a detailed study involving several possible models to illustrate this point. His models include those considered here.2Jones (1993a) stresses the importance of selecting a model which is consistent with the data and recommends the general-to-simple modeling approach of Hendry (1979) to achieve this. He also warns that parameter estimates from data-rejectable models may give very misleading indications of the dynamic nature of the behavioral relationships being modeled. Jones observes that researchers often tend to use some familiar specification they have been trained to use, or find easy to implement, and some might appeal to other researchers' success with a popular specification. Sadly this trend continues even now. Jones (1993a) insists that untested acceptance of a popular model specification may lead to erroneous inferences. For example, in his study, a popular dynamic specification results in reasonable and significant estimates. However, close scrutiny reveals that the model is not data-acceptable because it either over-estimated or under-estimated important economic parameters, such as the long run price elasticity and the speed of the adjustment process. Jones (1993b) double-checked Greene's (1992) selection of model for the US passenger vehicle use data using different modeling strategies. This motivates one to question an outcome (even if it is suggested by a well-known researcher) rather than accepting it at face value. However, finding a data acceptable model is not that simple. It is worth noting that if the true model which generated the given sample is linear with AR(1) errors, and one fits the dynamic linear model with white noise errors to the sample by the ordinary least squares (OLS) method, then it is likely to explain the data rather well (Giles, 1975 and Griliches, 1967). Therefore usual significance testing and a goodness of fit measure are not going to be helpful in this context. Because correct model specification is important for forecasting, and also for the purpose of further inference, it is desirable to have a powerful testing procedure to distinguish between plausible alternative models. For this type of problem nuisance parameters cannot be avoided. This may make the test non-similar in the sense that the test's size3 varies with the value of these nuisance parameters. Different approaches when testing in the presence of nuisance parameters are suggested in the literature. The classical approach to non-similar tests is to find exact non-similar critical values, for which sizes are never greater than the nominal significance level for all possible values of the nuisance parameters. Such critical values are typically obtained using the Monte Carlo method (Inder, 1986, King and McAleer, 1987 and Silvapulle, 1991). Other popular approaches include using bounds type tests and confidence intervals, as suggested in Dufour (1990) and Pesaran et al. (2001), and replacing unknown nuisance parameters with consistent estimates and then relying on asymptotic theory (Moreira, 2009). However, exact bounds tests can be less powerful than non-similar tests. Forchini (2005) shows that any test, with size bounded from above by a known constant, has potentially very low power and a large type II error. Kiviet and Dufour (2003) and Dufour (2006) suggest a maximized Monte Carlo (MMC) test that maximizes a simulated p value over the nuisance parameter space using SA. Dufour (2006) also suggests (and proves) asymptotically valid MMC (AMMC) tests that use consistent set estimators of nuisance parameters. 4 The maximum p value of an AMMC test can be obtained by maximizing the p value over a subset of the nuisance parameter space (for example a confidence set of the nuisance parameters) instead of the entire nuisance parameter space. 5 Thus, compared to the MMC approach, the AMMC approach will be less time-consuming ( Phipps and Byron, 2007). Although the MMC approach is gaining popularity ( Beaulieu et al., in press, Dufour and Tarek, 2006, Dufour and Valéry, 2009, Frederic and Olivier, 2006 and Thomas et al., 2007), it is criticized for the following reasons: (1) it can be computationally demanding, (2) MMC-based actual rejection frequency may be very much less than the level of the test and may in consequence be severely lacking power and (3) there is a possibility of getting a much larger p value for nuisance parameter values far away from the ones that actually generated the data ( MacKinnon, 2009). This paper adopts the classical approach. In particular, near-exact non-similar critical value-based size and power properties are analyzed, because obtaining exact non-similar critical values involves a large amount of computation (Inder, 1986). Studies, such as those of Tunnicliffe Wilson (1989), Rahman and King (1997) and Francke and de Vos (2007), indicate that in the presence of nuisance parameters marginal likelihood-based tests perform better than conventional likelihood-based tests in finite samples. On the other hand, if the null hypothesis and alternative hypothesis spaces are tightly restricted, King's (1987) PO tests can be most powerful at a chosen parameter point under the alternative hypothesis. However, PO tests cannot always be constructed when testing a composite null hypothesis. Sriananthakumar and King (2006) propose an APO test, called the g test, for this instance. The performance of PO and APO tests in the presence of unavoidable nuisance parameters is largely unknown. Theoretically, PO tests can be expected to be most powerful in the absence of nuisance parameters. If PO tests are found to work well in the presence of unavoidable nuisance parameters, this will advance their use in practice. The main contributions of this paper are the study of the finite sample performance of the APO test and marginal likelihood-based classical tests in the presence of unavoidable nuisance parameters; the construction of tests of an interesting, but largely overlooked, issue: testing for a static linear regression model with AR(1) errors against a dynamic linear regression model with white noise errors6; the comparison of different information measures in the context of marginal likelihood estimation in order to decide which is more accurate; and the application of near-exact non-similar critical values of the tests obtained using SA. The models and assumptions are discussed next; then descriptions of the APO test and marginal likelihood based classical tests are provided. Section 4 briefly introduces SA and discusses how exact and near-exact non-similar critical values can be obtained. Section 5 presents the details of the Monte Carlo experiment and summarizes the main findings. An illustrative application of the g test to a specific data set is presented in Section 6. Finally, concluding remarks reflect on the use of the APO test and marginal likelihood-based classical tests in the presence of unavoidable nuisance parameters.
نتیجه گیری انگلیسی
This paper has investigated the rather difficult testing problem of testing for a linear regression model with AR(1) errors against a dynamic linear regression model with white noise errors, using marginal likelihood based classical tests and APO tests. Researchers generally overlook this area, believing these two models to be observationally equivalent, so it is thought that considering either of them will not have significant consequences. However, a number of researchers have pointed out model differences and their impact on forecasting. This study clearly shows that large sample critical value-based classical tests, such as the LR and W tests, cannot be trusted under the null, even in a moderately large sample. The LM(L) and LM(E) tests are better in this regard, but not ideal. In general the near-exact non-similar critical value-based tests have reasonable size properties. Based on power results, APO tests are recommended for certain regions, such as the g(0.3) test for μ < 0.5 and g(0.5) test for 0.5 ≤ μ ≤ 0.7. 21 These APO tests are able to distinguish the null and alternative hypotheses of interest, even for situations where this is an extremely difficult task. Theoretically the PO test, or APO test, cannot be expected to perform well in the presence of unavoidable nuisance parameters. However, this study shows that the APO tests are performing well in such circumstances illustrating the resilience of such tests even under tough situations. Hence this study may advance the use of PO or APO tests in the future. The asymptotic tests, particularly the LR test, seem suitable for μ > 0.7. The LM(L) test appears better than the LM(E) test in small samples. As expected, both become similar as sample size increases. Generally, both the W(E) and W(L) tests are not suitable for small to moderate sized samples. Rahman and King (1994) preferred some marginal likelihood based asymptotic tests over APO tests of a simple null hypothesis in the linear regression context. However, this study considered a more complicated testing situation, where APO tests of a composite null were preferred over marginal likelihood-based tests, particularly in small to moderate sized samples. It has been shown here that, for testing, for instance, conflicting economic or finance theories, PO and APO testing both deserve more attention than they currently receive. While these tests, particularly the APO test, may look complicated, in reality they are not difficult to compute, with the assistance of SA.