شناسایی قوی استنتاج مبتنی بر شبیه سازی در مدل های مشترک گسسته/پیوسته برای بازارهای انرژی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
15316 | 2008 | 14 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, Volume 52, Issue 6, 20 February 2008, Pages 3148–3161
چکیده انگلیسی
In the analysis of energy use models, a common problem consists in correcting for endogenous discrete-choice variables. Indeed, energy demand equations often include endogenous dummies which reflect the underlying discrete-choice for e.g. energy equipment. The latter lead to discrete/continuous (D/C) statistical models where the discrete and continuous components are statistically dependent, so weak-identification problems may occur which stem from the “quality” of the first stage instrumental model. These problems are studied in the context of energy demand analysis. A wide mixed-logit-based class of models is considered which allow for dependent choices, heteroskedasticity and multi-dimensionality. The severity of weak-identification problems and relevance for empirical practice are documented, even with very large data sets. Tractable and reliable (in the sense of type I error control) solutions are proposed which combine generalized Anderson–Rubin (GAR) procedures and maximum simulated likelihood (MSL) methods for models commonly used in practice. Results are illustrated via Monte-Carlo examples and an empirical study on electricity demand.
مقدمه انگلیسی
Commonly used econometric models for energy markets often lead to situations where parameters are difficult to identify from observable data. These include, in particular, energy demand equations which account for endogenous choice (King, 1980, Dubin and McFadden, 1984, Hanemann, 1984, Train, 1993 and Train, 2003). Such equations are designed to model the consumers’ choice for energy equipment jointly with energy use, so they often include endogenous dummies which represent the underlying discrete-choice. The latter generally involves dealing with so-called discrete/continuous (D/C) models where the discrete and continuous components are dependent. Furthermore, realistic situations often require a complex non-linear and multi-dimensional discrete-choice model component, which calls for simulation-based estimation. Clearly, weak-identification problems may stem from both model components, yet our main focus in this paper is on weak-identification issues which relate to the quality of the discrete-choice model component. Confidence sets and hypothesis tests which come out from typical D/C estimation strategies are almost always validated via standard asymptotic arguments. The latter can easily be unreliable when regularity conditions hold only weakly; this occurs, in particular, under identification constraints. Inferential methods which fail to correct for the possibility of weak identification, including simulation-based methods, are fundamentally flawed and lead to serious type I error problems. Indeed, there is considerable evidence that conventional inference methods break down in such contexts (see e.g. the surveys in Stock et al., 2002; Dufour, 2003). The main facts demonstrated in this literature are: (i) standard asymptotics provide poor approximations to the sampling distributions of estimators and test statistics, (ii) this occurs even if identification constraints are maintained, (iii) serious problems are documented even with fairly large samples and sometimes in all sample sizes (i.e. it is not a “small” sample issue), (iv) standard correction techniques, e.g. the bootstrap, are also bound to fail, (v) the solution requires distributional—exact or asymptotic—results which take the possibility of near-unidentification into consideration. In other words, a valid inference procedure should be immune to the identification status, particularly to the quality of the first stage instrumental model. While a large literature has tackled this problem in usual IV-based contexts (see e.g. Angrist and Krueger, 1994, Dufour, 1997 and Dufour, 2003; Staiger and Stock, 1997; Stock and Wright, 2000; Dufour and Jasiak, 2001; Stock et al., 2002; Kleibergen, 2002 and Kleibergen, 2005; Moreira, 2003, Dufour and Taamouti, 2005, Dufour and Taamouti, 2006, Dufour and Taamouti, 2007, Joseph and Kiviet, 2005, Kiviet and Niemczyk, 2007 and Andrews et al., 2006), the case of dummy endogenous variables has not been directly addressed. The finite sample performance of standard asymptotics (versus the bootstrap, for instance) has been studied in discrete-choice models. See, for example, Davidson and MacKinnon, 1999 and Davidson and MacKinnon, 2007 who discuss the tobit and probit model, and Kim et al. (1996) who consider the generalized logit model. These results are relevant, although the problem at hand here is more complicated, since the choice variable enters as a right-hand-side endogenous regressor (rather than a non-continuous left-hand-side dependent variable). More specifically, to the best of our knowledge, empirical D/C models for energy demand are still largely analyzed via standard asymptotics. Technical successes with simulation-based inference and availability of large data sets in such contexts have, to some extent, delayed awareness of these very serious problems. However, it is now solidly established that weak instruments are not a small sample problem: because of asymptotic irregularities, spurious rejections may occur even with large data sets such as those available for demand analysis. In this paper, we study D/C models, in the context of energy demand analysis. Given our focus on realistic situations, we consider a wide mixed-logit-based class of discrete-choice model components, which allow for dependent choices, heteroskedasticity and multi-dimensionality. We first document the relevance and severity of weak-identification problems for empirical practice, even with very large data sets. Our simulations reveal severe size distortions for the commonly used Wald test based on estimated (instrumented) choice probabilities. These distortions occur with equal severity, whether the first stage choice model is estimated by OLS [i.e. is treated as a linear probability model (LPM)] or by mixed-logit maximum simulated likelihood (MSL). The definition we adopt for a weak model is empirically relevant (does not constrain the nuisance parameters to boundaries). Indeed, it is often tempting (from an empirical perspective) to dismiss weak-identification considerations, assuming they will occur only in textbook cases or in theoretical constructs. Here we document problems of empirical relevance for econometric practice. Secondly, we propose tractable and reliable (in the sense of type I error control) solutions which combine generalized Anderson–Rubin (GAR) procedures (see Dufour, 1997, Dufour, 2003 and Staiger and Stock, 1997; Stock and Wright, 2000; Dufour and Jasiak, 2001; Stock et al., 2002; Dufour and Taamouti, 2005, Dufour and Taamouti, 2006 and Dufour and Taamouti, 2007; Andrews et al., 2006) and MSL methods (see e.g. McFadden, 1989 and Train, 2003) for models commonly used in practice. GAR tests are computationally attractive: they require an F-type test procedure, to assess the significance of instrumental variables, within a properly defined augmented regression. A point-optimal instrument set may be obtained via a two-stage version of the F-test to be run as follows. The first stage requires estimating the instruments (for which we use MSL), and the second stage tests for their exclusion. The first and second stage statistics are obtained from two independent sub-samples, which justifies the use of the F-test in a generated regressors context. We show that such tests achieve perfect type I error control, regardless of the identification status. Our results are illustrated through a Monte-Carlo study and an empirical application, based on the electricity demand system estimated in Bernard et al. (1996). The paper is organized as follows. Section 2 focuses on the joint D/C model setting and briefly summarizes the literature closely related to the subject. In Section 2.3, we describe our proposed statistical inference methods. In Section 3, we present the results of our Monte-Carlo experiments. The empirical application is discussed in Sections 4 and 5 concludes.
نتیجه گیری انگلیسی
The new trend in discrete-choice modeling is to enhance the behavioral representation of the choice process. Consequently, econometric representations involve latent variables among the set of explanatory variables. Such difficulties raise further estimation challenges that can effectively be approached through simulation-based procedures. Nevertheless, latent covariates also lead to situations where parameters are more difficult to identify from observable data. From the perspective of confidence set estimation and hypothesis tests, this leads to the failure of standard asymptotics. Spurious statistical decisions would occur frequently even with simulation-based methods, which beats the purpose motivating reliance on such models and methods in the first place. In this paper, we document and address the latter problem in a realistic situation, specifically, in the context of joint D/C models for energy demand. With the help of widely available simulation methods, the estimation of such specifications in practice is more accessible especially when two-step estimation is concerned. However, testing is still conducted via standard methods. We first show that identification difficulties have serious implications for empirical practice: standard and even simulation-based tests can be arbitrarily unreliable even with large data sets and empirically relevant settings. We then propose identification-robust procedures for basic inference problems in such contexts, and illustrate their usefulness via Monte-Carlo experiments and an empirical illustrative example. In view of the general current trend in this literature, our results create avenues for developments in general discrete-choice contexts.