دانلود مقاله ISI انگلیسی شماره 19607
ترجمه فارسی عنوان مقاله

بررسی عملکرد پیش بینی الگوهای اقتصاد سنجی از جریان های ترافیک مسافرین هوایی با استفاده از معیارهای خطای متعدد

عنوان انگلیسی
Evaluating the forecasting performance of econometric models of air passenger traffic flows using multiple error measures
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
19607 2014 21 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : International Journal of Forecasting, Volume 27, Issue 3, July–September 2011, Pages 902–922

ترجمه کلمات کلیدی
ترافیک خطوط هوایی - دقت پیش بینی تطبیقی - مدل سازی اقتصاد سنجی - مدت زمان متغیر پارامتر - سری های زمانی مقطع مخلوط - همانندسازی -
کلمات کلیدی انگلیسی
Airline traffic, Comparative forecasting accuracy, Econometric model building, Time-varying parameter, Pooled cross-section time series, Replication,
پیش نمایش مقاله
پیش نمایش مقاله  بررسی عملکرد پیش بینی الگوهای اقتصاد سنجی از جریان های ترافیک مسافرین هوایی با استفاده از معیارهای خطای متعدد

چکیده انگلیسی

Airline traffic forecasting is important to airlines and regulatory authorities. This paper examines a number of approaches to forecasting short- to medium-term air traffic flows. It contributes as a rare replication, testing a variety of alternative modelling approaches. The econometric models employed include autoregressive distributed lag (ADL) models, time-varying parameter (TVP) models and an automatic method for econometric model specification. A vector autoregressive (VAR) model and various univariate alternatives are also included to deliver unconditional forecast comparisons. Various approaches for taking into account interactions between contemporaneous air traffic flows are examined, including pooled ADL models and the enhanced models with the addition of a “world trade” variable. Based on the analysis of a number of forecasting error measures, it is concluded that pooled ADL models that include the “world trade” variable outperform the alternatives, and in particular univariate methods; and, second, that automatic modelling procedures are enhanced through judgmental intervention. In contrast to earlier results, the TVP models do not improve accuracy. Depending on the preferred error measure, the difference in accuracy may be substantial.

مقدمه انگلیسی

The Air Transport industry is an increasingly important component of national and global economies. Its development involves enormous capital investments on the part of all of the agents concerned. Because of the perishable nature of the product (once the aircraft takes off, the empty seats are considered as an opportunity cost to the airline), forecasting air traffic flows in both the short and medium terms is an important factor in increasing the profitability of airlines, and a critical part of transport planning by air transport authorities and government bodies. A number of papers have attempted to develop models suitable for forecasting air traffic flows, including Abed, Ba-Fail, and Jasimuddin (2001), Anderson and Kraus (1980), Fridström and Thune-Larsen (1989), Grubb and Mason (2001), Ippolito (1981), Jorge-Calderon (1997), Kaemmerle (1991), Matsumoto (2004) and Young (1972). More recently, Blunk, Clark, and McGibany (2006) and Lai and Lu (2005) have examined the effects of the September 11, 2001, terrorist incident using demand models. Because of the importance of infrastructure planning, various agencies with planning and regulatory responsibilities for air traffic infrastructure, as well as private companies such as aircraft manufacturers and airlines, have also developed their own models. These models are largely based on the approach proposed by Quandt and Baumol (1966), who showed that socio-economic variables are important in modelling point-to-point air traffic flows between countries. However, these earlier studies have provided only limited evidence on forecasting accuracy and the comparative performance of alternative model specifications. In particular, many of these forecasting studies have contented themselves with univariate methods, and many earlier evaluations have been limited to only forecast horizons which are too short to be valuable for planning purposes. This problem is exacerbated by looking at the comparative performance, which may be dependent on the forecast horizon. In addition, recent years have seen the development of a number of alternative econometric modelling approaches that offer the prospect of enhanced forecasting accuracy, but have rarely been compared (Allen & Fildes, 2001). These approaches include Hendry’s general-to-specific approach to model building with a full lag structure (Hendry, 1986), a model that includes a full lag structure and pools cross-sectional evidence, a time-varying parameter (TVP) model (Garcia-Ferrer, Highfield, Palm, & Zellner, 1987), and a PcGive automatic selection model based on Hendry’s methodology (Hendry & Krolzig, 2001). Replicating and extending the limited studies published so far is important for generalising about the conditions under which the results hold (Hubbard & Vetter, 1996). In light of the importance of air traffic forecasting, the paucity of evidence on the relative benefits of these alternative approaches, and the lack of insight into what leads to differences in relative performance, this paper compares the accuracy of the above four approaches using price, income and trade as potential explanatory variables. An enhanced version of these models incorporating a “world trade” variable to capture the overall growth in demand across all countries is also included. These are contrasted with ‘naïve’ univariate alternatives, as well as with a vector autoregressive (VAR) model. Our aim is to offer further evidence on the conditions under which econometric methods outperform time series alternatives. Recent aggregate evidence from Athanasopoulos, Hyndman, Song, and Wu (2008) has shown no overall improvement from using econometric methods with tourism data series, apparently similar to those considered here. The choice of econometric models to include in the comparisons has been based on their strong performances in earlier empirical studies (for example, Garcia-Ferrer et al., 1987). In general, for non-financial series, earlier research has shown that such econometric models outperform the autoregressive and naïve benchmarks (Allen & Fildes, 2001), but the evidence is not overwhelming, and our principal aim is to provide further reliable evidence. We also wish to investigate a number of subsidiary hypotheses. (i) Model specification is usually overly subjective (Pagan, 1999), but does the subjectivity of the expert modeller compared to the automatic modelling approach embodied in PcGive lead to improvements in accuracy? Whilst there is substantial evidence that econometric model-based forecasts using the same information set outperform judgement (Dawes, Faust, & Meehl, 1989), to the best of our knowledge, there is no study that directly addresses this question of objective versus subjective specifications, with the closest studies being those concerned with the specification of ARIMA models. However, a priori, the same limitations of judgemental forecasts would apply to judgemental model building, and the automatic approach should lead to improvements. (ii) The consideration of interactions between contemporaneous air traffic flows, through both the inclusion of a “world trade” variable and estimation using a Seemingly Unrelated Regression (SUR) approach, offers the prospect of improved accuracy, but, as Du Preez and Witt (2003) show, this is far from inevitable. Following on from Zellner, Hong, and Min’s (1991) conclusion as to the benefits of including a generic variable that acts as a proxy for the many unobserved explanatory variables, we hypothesize that both of these approaches should lead to accuracy improvements. (iii) Although a relatively neglected approach to econometric model specification, the inclusion of time-varying parameters has typically led to improved forecasting accuracy (Allen and Fildes, 2001 and Garcia-Ferrer et al., 1987). A recent example in tourism forecasting (Li and Song et al., 2006 and Li and Wong et al., 2006) again demonstrated improved performance, including at longer forecast horizons. In the airline industry, because of its changing structure over the 40 year history we study, a priori we would expect the same conclusions to hold, and this will also be examined. (iv) Previous research has emphasized the potential importance of the choice of error measure (Fildes & Ord, 2002). We will also provide further evidence on this issue. The plan of the paper is as follows. Section 2 describes the variables and data. Section 3 presents the models, methodology and error measures. The results based on ex-post forecasting up to three years ahead are discussed in Section 4. Finally, Section 5 draws conclusions on the relative forecasting accuracy of the particular methods.

نتیجه گیری انگلیسی

In this research, several econometric models have been specified and estimated for air passenger traffic demand between the UK and five selected countries. They included three model types that have proved effective in earlier studies, and one goal of this research was to see whether these results could be reproduced in a different context. The model types were autoregressive distributed lag models that included a common ‘world’ variable, a system model that provided pooled estimates, and time-varying parameter models. The demand models developed were then evaluated for their (conditional) out-of-sample forecasting accuracy. The results proved robust over the three-year lead time. The one-, two- and three-year-ahead ex-post forecasts were generated and compared with the benchmark models, namely naïve model 1 (NM1) and naïve model 2 (NM2), as well as two established univariate alternatives, an AR(3) model and an exponential smoothing model. Various error criteria were used in evaluating model performances across longer lead times than is typical. Different error measures have tended to produce different results in forecast evaluations, and therefore any discussion of comparative forecasting performances must be supported by evidence obtained from several error measures. Interestingly, an analysis of the error in the cumulative growth forecast for 3 years ahead showed an increased dominance of the ‘ADL model with world trade’. This adds support to the principle that econometric models perform comparatively better when there are large changes in the explanatory variables. Regarding the main hypothesis that econometric models will outperform univariate benchmarks, the ADL models with the “world trade” variable consistently outperformed both the naïve models and the two time series methods. However, only for the UK–USA and UK–Canada routes did the ‘ADL model with world trade’ outperform the AR(3) model when all of the error measures and lead times were considered, despite their aggregate strong performance, as shown in Table 6. The UK–Canada model has the highest explanatory power with no autoregressive component. The remaining comparisons offer further examples of a pure time series model appearing to be more accurate than a causal model. A tentative explanation for this is that the proposed causal model could be better specified with the inclusion of other important drivers, and in particular measures of structural change in the market, such as increased price competition. An alternative rationalisation lies in the possible effects of model complexity on estimation reliability in relatively small samples, which can lead to autoregressive models outperforming even well-specified structural models, as was the case here (Favero & Marcellino, 2005). A subsidiary hypothesis concerned the relative performance of the subjectively specified ADL model compared to that of the automatic specification delivered by PcGive. The ADL models (with or without the “world trade” variable) proved better performers than PcGive automatic, contradicting our first hypothesis as to the benefits of removing subjectivity in model building. The difference arises from the relative parsimony (in the larger samples) of PcGive automatic models in attempting to avoid including spurious relationships. With the smaller sample sizes there was evidence of multicollinearity in the models, as specified by the PcGive automatic based models. This result emphasises the need for further research examining the differences between automatically specified models and their subjectively specified alternatives. Simultaneously pooling the data and estimating the models improved the forecasting performance, although the differences were slight. This confirms the conclusions of Garcia-Ferrer et al. (1987) andZellner et al. (1991), who found that forecasting models recognizing contemporaneous co-variation and using the seemingly unrelated regression (SUR) approach showed improvements over models estimated individually. The time-varying parameters (TVP) model failed to show the expected improvements over fixed parameters models, in conflict with the principle laid down by Allen and Fildes (2001), that time-varying parameter models are most valuable when the appropriate model structure is not well-understood, or unobserved variables are affecting its structure. This result is also inconsistent with findings in tourism demand forecasting studies (Li & Song et al., 2006). An examination of the parameter variation in the TVP models in the forecast period shows little variation for most countries, but the patterns look much the same as those of Li and Wong et al. (2006). Only Italy showed a predictable parameter drift, a condition that would suggest that TVP models might outperform their fixed parameter equivalents, although there is no evidence of this in the results. This apparent contradiction suggests a need for further detailed research. The results depend on the chosen error measure, and although the rankings are positively correlated, there remains a clear difference between the relative measures (MASE and GRelAE) and the RMSE and GRMSE, which are not standardised. An investigation into the error distributions (absolute and relative) revealed a small number of outliers which have affected the country-level results, changing the overall rankings. While arguments such as the robustness to outliers (Fildes, 1992) and data characteristics (Hyndman & Koehler, 2006) are important and argue for the use of relative measures, the choice in practice rests with the users’ preferences. The findings from this research show that when reporting empirical results, a range of measures are needed which incorporate natural metrics that fit the (often implicit) decision problem and associated loss function, as well as one of these relative metrics. September 11, 2001, naturally had a major negative impact on the airline industry. However, 2002 data showed that the recovery process was already underway, although it was not back to its long term growth path. Empirical findings for the US from Lai and Lu (2005), as well as those for the UK, Germany and Australia from Njegovan (2006), also indicate that shocks to air passenger traffic are largely transitory, and do not, in general, merit the revision of forecasts over a long horizon. To check the models’ sensitivity, we re-estimated ADL models with the world trade variable by including a dummy variable which equals 1 for the last two data points, in order to exclude the effect of September 11. The results show that the dummy variable is insignificant for all 5 models, and the coefficients on other variables appear to be stable. If we include the dummy variable from the beginning when simplifying the ADL model, only the model for the UK–Canada route is changed, with more variables now included in the model and the dummy variable appearing to be significant at the 10% level. In part, this study has attempted to replicate and extend aspects of the Garcia-Ferrer et al. (1987) study of GDP growth. The research has confirmed the difficulties of such replications and the choices made by earlier researchers. We have therefore attempted to be explicit in Section 3 about our model building and the software we have used. For example, the choice of priors in the TVP modelling proved important in Garcia-Ferrer et al.’s study. Our use of standard priors, as discussed in Section 3, led to a poorer performance than our preliminary models had suggested. One area of potential importance in such replications is the choice of data and error measures. Here we have examined the effects of omitting 2001/2 because of the September 11 effect, and have also considered a variety of error measures. However, different countries and different measures of price (in particular) might well lead to different conclusions. The importance of these replications, as Hubbard and Vetter (1996) remarked, is that they cumulatively lead to a greater understanding of the effectiveness of alternative [forecasting] models. One-off studies, however expertly carried out, can do no such thing. In summary, the results of this research show that employing appropriately specified structural econometric methods leads to an improved forecasting performance for air traffic, compared to non-structural alternatives, although our comparison is conditional, and therefore favours the structural methods. This adds further evidence to the weak conclusions drawn by Allen and Fildes (2001) as to the benefits of econometric models compared to their univariate alternatives. The absolute differences between the two are small, however, and arguments for simplicity suggest that a simple univariate model would be adequate for most users. The use of a VAR model here adds little. Our key subsidiary hypotheses, laid out in the introduction, have also found support in this replication. However, the adoption of a TVP approach did not help, despite the apparent structural changes in the market. Nor did the adoption of an automatic model-building approach for specifying the ADL model. These results, together with the structural changes seen in the UK travel market since 2002, and in particular the rise of the low cost carriers, may well require a different approach to modelling, based on forecasting the two competing market segments separately.