آموزش، اطلاعات، و مرتب سازی در بازی های ورود به بازار: نظریه و شواهد
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
7126 | 2005 | 32 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Games and Economic Behavior, Volume 51, Issue 1, April 2005, Pages 31–62
چکیده انگلیسی
Previous data from experiments on market entry games, N-player games where each player faces a choice between entering a market and staying out, appear inconsistent with either mixed or pure Nash equilibria. Here we show that, in this class of game, learning theory predicts sorting, that is, in the long run, agents play a pure strategy equilibrium with some agents permanently in the market, and some permanently out. We conduct experiments with a larger number of repetitions than in previous work in order to test this prediction. We find that when subjects are given minimal information, only after close to 100 periods do subjects begin to approach equilibrium. In contrast, with full information, subjects learn to play a pure strategy equilibrium relatively quickly. However, the information which permits rapid convergence, revelation of the individual play of all opponents, is not predicted to have any effect by existing models of learning.
مقدمه انگلیسی
Theories of learning in games are increasingly being subjected to tests using data from controlled laboratory experiments with paid human subjects. The success or failure of various learning models has been assessed on the basis of how well these models predict or track the behavior of subjects in these experimental sessions. Given the usual short time horizon in experiments, researchers interested in testing models of learning have tended to concentrate on assessing their short-run fit. Long-run predictions have largely been ignored. One might reasonably be uncertain whether asymptotic results are likely to be relevant in experiments with finite length, or simply be interested in how subjects respond to novel situations. However, the long-run behavior of different learning models is often the same, giving clear hypotheses to test.1 This paper is a first attempt to see whether the long-run predictions of learning models do indeed help to explain behavior in the market entry game. This much studied game is a stylized representation of a very common economic problem: a number of agents have to choose independently whether or not to undertake some activity, such as enter a market, go to a bar, drive on a road, or surf the web, the utility from which is decreasing in the number of participants. Those choosing not to undertake the activity can be thought of as staying at home, staying out of the market, or simply not participating. Market entry games typically admit a large number of Nash equilibria. Pure equilibria involve considerable coordination on asymmetric outcomes where some agents enter and some stay out. The only symmetric outcome is mixed, requiring randomization over the entry decision. There also exist asymmetric mixed equilibria, where some agents play pure while others randomize.Given this multiplicity of equilibrium outcomes, an obvious question is: which type of equilibrium are agents likely to coordinate upon? Many previous experiments have been conducted in an effort to address this and related questions. See, for example, Rapoport et al. (1998, 2000, 2002), Seale and Rapoport (2000), Camerer and Lovallo (1999), Sundali et al. (1995), and Erev and Rapoport (1998). However, up to now, none of these studies has yielded evidence to suggest that repeated play leads to coordination on any type of Nash equilibrium, although in many experiments the average frequencies of entry in market entry games look remarkably like those generated by Nash equilibrium play.2 That is, market entry games seem to represent a case where Nash equilibrium fails as a predictor of human behavior, at least at the individual level. Here we investigate the alternative hypothesis that, given sufficient repeated play and adequate feedback, individuals in experimental market entry games should learn equilibrium behavior. This assertion leads naturally to further questions: what in practice is “sufficient” and what is “adequate”? How long should we expect to wait before agents coordinate on an equilibrium?What information is necessary? How do these factors interact, for example, does better information lead to faster convergence? In this paper, we attempt to answer these questions in two ways. First, we provide formal results on long-run behavior in market entry games under two different models of learning that differ in terms of sophistication and use of information. Second, we report the results of a new series of experiments designed to test these predictions. We show that two different models of learning predict not only that play should converge to a Nash equilibrium, but also that it should only converge to a subset of the total number of Nash equilibria. These predictions are in clear contrast with all previous experimental evidence on market entry games, which as noted above, has not been consistent with any Nash equilibrium. There are two models of learning which have attracted particular interest in explaining behavior in laboratory experiments, reinforcement learning and stochastic fictitious play.3 They differ considerably in terms of sophistication. However, we show that under both, play must converge to an asymmetric pure equilibrium that involves what could be called “sorting,” where some players always enter and the remaining players always stay out. However, these are asymptotic results. Thus, even if one of these learning models accurately describes human behavior, there is no guarantee that we would see the predicted outcome in the time available for laboratory experiments. What we seek to examine is whether such results are relevant in the timeframe of experiments, and by implication whether they are relevant outside the laboratory.Previous experimental investigations of market entry games have concentrated on testing whether the symmetric mixed equilibrium or an asymmetric pure Nash equilibrium characterize the behavior of experimental subjects. In fact, the data seem to suggest a much more heterogeneous outcome, with some subjects apparently mixing between the two choices and some playing pure. However, the average number of entries per period is in rough accordance with equilibrium. Erev and Rapoport (1998) report two things of interest. First, distance from the symmetric mixed equilibrium is decreasing over time. Second, speed of convergence of the average number of entries toward Nash equilibrium levels is faster when more information is provided. Learning models provide a potential explanation for the first of these experimental findings. For example, we show that under both reinforcement learning and stochastic fictitious play the mixed equilibrium is a saddlepoint, and hence movement toward this equilibrium in the short run is not inconsistent with convergence to a pure strategy equilibrium in the long run. In addition, Erev and Rapoport report a decrease in “alternation” over time, that is, the frequency that an agent plays the strategy which she did not play the previous period, which suggests individuals are getting closer to pure strategies. As to the second finding, the speed of convergence is more difficult to pin down theoretically and, in particular, the hypothesis that stochastic fictitious play that uses information about forgone payoffs is faster than simple reinforcement learning models that do not, has been difficult to formalize. Indeed, the results of our experiments are at variance with theoretical predictions about the impact of information on learning. Existing experiments on market entry games have not provided ideal data sets to test the predictions of learning theory. For example, Rapoport et al. (1998) had sessions lasting 100 periods, but within that time, the parameter which determined the number of entrants in equilibrium was constantly altered. Erev and Rapoport (1998) kept the parameters constant in each session, but each session lasted 20 periods, which is probably not sufficient for long-term behavior to emerge. As the capacity parameter c changes, the profile of strategies necessary to support a Nash equilibrium also changes, making coordination on a Nash equilibrium extremely challenging. There have been other experimental investigations of learning behavior employing large numbers of repetitions, for example, in Erev and Roth (1998), or in single person decision problems, Erev and Barron (2002). But the interest in these studies was to fit simulated learning models rather than to test theoretical results on convergence. The new experiments on which we report here have several new features. First, each session involved 100 periods of play of an unchanging market entry game to give some chance for long-run behavior to be observed. Second, three different information treatments were employed. In the first “limited information” treatment, subjects were given no initial information about the game being played and each round were only told the payoff they earned. In the second, “aggregate information” treatment subjects were told the payoff function, and then were told after each round the number of subjects who had entered, the number who had stayed out, and the payoff each had earned. In the final “full information” treatment subjects were given the same information as in the aggregate information treatment, but in addition after each round the choice and payoff of each individual subject was revealed.Our results are somewhat surprising. In the limited information treatment, there is some tendency for groups of subjects to converge upon a pure equilibrium, but only toward the very end of the 100 period session. The aggregate information treatment, despite the additional information provided, produced results very similar to those in the limited information treatment. In the full information treatment, the tendency toward sorting was much greater than in the other two treatments. This is despite the fact that all of the learning models considered predict no effect from the additional information provided in the full information treatment.
نتیجه گیری انگلیسی
We have derived new results on learning behavior in market entry games and have carried out an experiment to test our predictions. The theoretical predictions appear to have some support. In most sessions, toward the end of 100 rounds, play was at or close to the pure equilibrium outcome predicted by the reinforcement and fictitious play learning models. These findings suggest that it may take a substantial number of repetitions before the play of experimental subjects in market entry games (and possibly other games as well) approaches the asymptotic predictions of learning models. Consequently, caution appears called for in using asymptotic results for learning models to predict or characterize behavior in economic decision-making experiments, which are typically conducted for relatively shorter lengths of time.Our experimental design also enabled us to investigate subjects’ use of information. Our main conclusion here is that individuals are adaptable in ways that are not captured by current learning models. When individuals possess the minimal amount of information assumed by reinforcement learning models, as in our limited information treatment, such that they do not even know that they are playing a game, they are still capable of learning equilibrium behavior. However, reinforcement learning does not capture the change in behavior that occurs when more information is provided. Similarly, belief based learning models, such as fictitious play, do not capture the qualitative difference in play between our aggregate and full information treatments. One possible explanation for the differences we observe is that individuals are using repeated game (dynamic) strategies that are not captured by the learning models considered. The most common class of repeated game strategies are collusive strategies that permit players to gain greater payoffs than they would in a static equilibrium. There is no evidence for that type of behavior here. We are left to speculate what other objectives the subjects might have had, and what dynamic strategies, out of an infinite class, might have been employed. Identification of these different alternatives is not easy. A second possibility, in line with the work of Camerer et al. (2002), is that certain “sophisticated” players are using the repeated nature of the game and the information about individual actions that is available in the full information treatment to teach other, less sophisticated agents how to play (e.g. to stay out). We found only weak evidence in support of this teaching hypothesis, but perhaps that is because we do not examine strategic behavior across a variety of different repeated games as Camerer et al. (2002) do In any case, no single learning model appears to capture the behavior observed across our three experimental treatments. We hope that our analysis has shed some light on the shortcomings of existing learning models, and spurs other researchers to provide further improvements.