خطرات استفاده از داده کاوی : موردی از اثرات تقویمی در بازدهی سهام
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
22027 | 2001 | 38 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Journal of Econometrics, Volume 105, Issue 1, November 2001, Pages 249–286
چکیده انگلیسی
Economics is primarily a non-experimental science. Typically, we cannot generate new data sets on which to test hypotheses independently of the data that may have led to a particular theory. The common practice of using the same data set to formulate and test hypotheses introduces data-mining biases that, if not accounted for, invalidate the assumptions underlying classical statistical inference. A striking example of a data-driven discovery is the presence of calendar effects in stock returns. There appears to be very substantial evidence of systematic abnormal stock returns related to the day of the week, the week of the month, the month of the year, the turn of the month, holidays, and so forth. However, this evidence has largely been considered without accounting for the intensive search preceding it. In this paper we use 100 years of daily data and a new bootstrap procedure that allows us to explicitly measure the distortions in statistical inference induced by data mining. We find that although nominal p-values for individual calendar rules are extremely significant, once evaluated in the context of the full universe from which such rules were drawn, calendar effects no longer remain significant.
مقدمه انگلیسی
Economic theory often is vague about the relationship between economic variables. As a result, many economic relations have been initially established from apparent empirical regularities and had not been predicted ex ante by theory. Like many of the social sciences, economics predominantly studies non-experimental data and thus does not have the advantage of being able to test hypotheses independently of the data that gave rise to them in the first instance. If not accounted for, this practice, referred to as data mining, can generate serious biases in statistical inference.1 In the limited sample sizes typically encountered in economic studies, systematic patterns and apparently significant relations are bound to occur if the data are analyzed with sufficient intensity. One of the most striking examples of a data-driven finding that was not anticipated by theory is the apparently very strong evidence of seasonal regularities in stock returns. Calendar effects were the first to be analyzed in the “Anomalies” section of the inaugural issue of Journal of Economic Perspectives ( Thaler 1987a and Thaler 1987b). Indeed, theoretical considerations would suggest that researchers should not even be looking for such patterns in the first instance. According to standard economic theory, stock prices should follow a martingale process and returns should not exhibit systematic patterns, thus ruling out seasonal components unless these can be related to systematic variations in risk premiums, cf. Samuelson (1965), Leroy (1973), and Lucas (1978). As reflected in the initial Mark Twain quote, investors have nevertheless long been fascinated by the possibility of finding systematic patterns in stock prices that, once detected, promise easy profits when exploited by simple trading rules. Moreover, Merton (1987) points out that “economists place a premium on the discovery of puzzles, which in the context at hand amounts to finding apparent rejections of a widely accepted theory of stock market behavior” (p. 104). Consequently, there is a long tradition among investors and academics of searching through stock market data; published academic studies on calendar effects go back to at least the early 1930s, e.g. Fields (1931), Fields (1934). As a result, common stock market indexes such as the Dow Jones Industrial Average and the Standard&Poor's (S&P) 500 Index are among the most heavily investigated data sets in the social sciences. Lo and MacKinlay (1988) point out that the degree of data-mining bias in a given field can be expected to increase with the number of studies published on the topic. Since so many academics and investors have looked at common US stock price indexes in an attempt to detect regularities, the performance of the best calendar rules cannot be viewed in isolation. Data with important outliers, such as those observed in stock market returns, are particularly prone to data-mining biases. If enough economic models are studied, by pure chance some of them are likely to outperform a given benchmark by any economic or statistical criterion. For example, models that implied investors should have been short in the stock market during October 19, 1987 are likely to outperform the market index in a longer sample simply because of the paramount significance of this single observation. As a result of these endeavors, there is now a very large literature reporting apparent “anomalies” in stock returns, see, e.g., Dimson (1988). Grouped by calendar frequency, researchers have reported evidence of abnormal returns related to day of the week effects ( Ball and Bowers, 1988; Cross, 1973; Fields, 1931; French, 1980; Gibbons and Hess, 1981; Jaffe and Westerfield, 1985; Keim and Stambaugh, 1984; Lakonishok and Levi, 1982; Rogalski, 1984), week of the month effects ( Ariel, 1987; Lakonishok and Smidt, 1988), month of the year effects ( Haugen and Lakonishok, 1988; Keim, 1983; Roll, 1983; Rozeff and Kinney, 1976), turn of the month effects ( Ariel, 1987; Hensel and Ziemba, 1996; Lakonishok and Smidt, 1988), turn of the year effects ( Haugen and Lakonishok, 1988; Jones et al., 1987; Lakonishok and Smidt, 1984; Ritter and Chopra, 1989; Roll, 1983) and holiday effects ( Fields, 1934; Haugen and Lakonishok, 1988; Jones et al., 1987; Lakonishok and Smidt, 1988). Interestingly, none of these calendar effects were preceded by a theoretical model predicting their existence. This is an important consideration; surveying the philosophy of science literature, Campbell and Vinci (1983) write that “Philosophers of science generally agree that when observational evidence supports a theory the confirmation is much stronger when the evidence is novel”. Similarly, Kahn et al. (1996) present a Bayesian model that formalizes the idea that empirical evidence giving rise to a new theory does not support the resulting theory as strongly as when the evidence had been predicted ex ante by the theory. Thus, the findings of systematic seasonal patterns in stock returns leave us with a conundrum: do the apparent regularities in stock returns really imply a rejection of simple notions of market efficiency, or are they just a result of a large, collective data-mining exercise? Many researchers express awareness of this problem. Lakonishok and Smidt (1988), for example, comment on the seasonal regularities this way: “However, it is at least possible that these new facts are really chimeras, the product of sampling error and data mining”. In this paper we conduct an analysis that addresses these concerns. Based on a survey of the types of calendar rules that have been studied by researchers, we construct a universe of calendar trading rules using permutational arguments that do not bias us in favor of, or against, particular calendar effects. The universe contains nearly 9500 different calendar effects and the best calendar rule is evaluated in the context of this set. We do not imagine that this large set of calendar rules was inspected by any one individual investor or researcher. Rather, the search for calendar rules has operated sequentially across the investment community as a whole with the results of individual investors being reported gradually through the survival of the “fittest” calendar rules. Viewed this way, the number of rules we inspect does not appear to be unrealistically large. We find that although many different calendar rules produce abnormal returns that are highly statistically significant when considered in isolation, once the dependencies operating across different calendar rules are accounted for, then the best calendar rule no longer achieves a p-value that is significant at conventional critical levels. This conclusion is robust to whether a mean return criterion or a risk-adjusted Sharpe ratio criterion is used in the assessment. Consistent with this finding, the performance of the calendar rule that was best in-sample actually generates inferior performance in an out-of-sample experiment using either cash or futures market prices. To alleviate the concern that even genuinely significant calendar rules may not appear to be significant if assessed jointly with a sufficiently large set of “irrelevant” rules, we also evaluate the best-known calendar effects in a much smaller universe of 244 calendar rules. Again we find that the apparent statistical significance of the best calendar effects is not robust to data-mining effects. The paper proceeds as follows. Section 2 discusses why standard asset pricing theory precludes calendar effects and Section 3 explains the bootstrap procedure that we use in our analysis to account for the effects of data mining. Section 4 describes the empirical evidence supporting the existence of calendar effects, and Section 5 explains the design of our experiment. 5 and 6 report the empirical results of our analysis, while Section 7 concludes.
نتیجه گیری انگلیسی
In their systematic study of calendar effects in the DJIA index, Lakonishok and Smidt (1988) conclude “In summary, DJIA returns are persistently anomalous over a 90-year period around the turn of the week, around the turn of the month, around the turn of the year, and around holidays”. They explicitly acknowledge the potential dangers of data-mining effects and state that “The possibility that these particular anomalies could have occurred by chance cannot be excluded, but this is very unlikely”. In this paper we have shown that, in fact, when assessed in the context of either a large universe or a restricted universe of calendar rules that could plausibly have been considered by investors and academics with access to our data set, the strength of the evidence on calendar anomalies looks much weaker. Using reality check p-values that adjust for the effects of data mining, no calendar rule appears to be capable of outperforming the benchmark market index. This is true in all of the individual sample periods, in the out-of-sample experiment with the DJIA and S&P 500 Futures data, and in the full sample using a century of daily data. We find it suggestive that the single most significant calendar rule, namely the Monday effect, has indeed been identified in the empirical literature. This is probably not by chance and it indicates that very substantial search for calendar regularities has been carried out by the financial community.22 It is particularly noteworthy that when the Monday effect is examined in the context of as few as 20 day-of-the-week trading rules, and during the sample period originally used to find the Monday effect, its statistical significance becomes questionable. Subsequent to its appearance, various theories have attempted to explain the Monday effect without much success. Thaler (1987b) lists a number of institutional and behavioral reasons for calendar effects. Our study suggests that the solution to the puzzling abnormal Monday effect actually lies outside the specificity of Mondays and rather has to do with the very large number of rules considered besides the Monday rule. Blame for data mining cannot and must not be laid on individual researchers. Data exploration is an inevitable part of the scientific discovery process in the social sciences and is indeed capable of revealing unsuspected regularities when these exist. The danger lies in confusing apparent with real effects. Many researchers go to great lengths in attempts to avoid this pitfall. Ultimately, however, it has been extremely difficult for a researcher to account for the effects the cumulated “collective knowledge” of the investment community may have had on a particular study. The methods used here provide a principled way to conduct research as a sequential process in which new studies build on evidence from earlier papers. In evaluating a body of research it is important to assess the results not by treating the individual studies as independent observations but by explicitly accounting for their cross-dependencies. In doing this, one should not be overwhelmed by the sheer amount of empirical evidence. This is sometimes difficult because the dependencies between results in different studies are unknown. For example, Michael Jensen, in his introduction to the 1978 volume of the Journal of Financial Economics on market anomalies writes “Taken individually many scattered pieces of evidence … don't amount to much. Yet viewed as a whole, these pieces of evidence begin to stack up in a manner which make a much stronger case for the necessity to carefully review both our acceptance of the efficient market theory and our methodological procedures” (p. 95). Our results show that even supposedly strongly supported empirical phenomena may not stand up to closer scrutiny. There may well be many other such surprises waiting for researchers trying to establish our degree of knowledge about economic phenomena.