ترجمه فارسی عنوان مقاله

آموزش قوانین جمعیتی در بازی های معمولی - فرم متقارن: تئوری و شواهد

عنوان انگلیسی

Population rule learning in symmetric normal-form games: theory and evidence

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
7104	2001	17 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Journal of Economic Behavior & Organization, Volume 45, Issue 1, May 2001, Pages 19–35

ترجمه کلمات کلیدی

- قوانین - آموزش - بازی - تجربی - تست -

کلمات کلیدی انگلیسی

Rules,Learning,Games,Experimental,Testing,

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

A model of population rule learning is formulated and estimated using experimental data. When predicting the population distribution of choices and accounting for the number of parameters, the population rule learning model is much better than aggregation of individually estimated rule learning models. Further, rule learning is a statistically significant and important phenomena even when focusing on population statistics, and is much better than one-rule learning dynamics.

مقدمه انگلیسی

Recent learning research in one-shot games can be divided into two domains: (i) population learning or evolutionary dynamics as typified by replicator dynamics,1 and (ii) individual learning.23 The first domain focuses on how the population distribution of play changes over time, while the second domain focuses on how an individual’s behavior changes over time. Individualistic models are needed for investigating the nature and characteristics of individual learning patterns and for assessing the amount of diversity in the population. Further, if one wants to construct a cognitive theory of individual behavior in games, then individualistic models are essential because individual details could be masked in population statistics. From a decision-theoretic framework, however, for one-shot games it is necessary and sufficient for a player to have a belief about the other players’ actions, and when the other players are randomly drawn from a population of potential players such a belief is equivalent to a forecast of the population distribution of other players’ actions. It is neither necessary nor sufficient to know anything about a single individual’s learning dynamics, since one’s actual opponents are random draws from a population. For example, to know which side of the road to drive on in the US, I do not need to know any specific history about the driver approaching me on the highway; I only need to know that in the US all sober drivers stay on the right side of the road. Ideally, as in general equilibrium economics, one would like a theory of individual learning that aggregates up to a theory of population learning. However, we will encounter similar difficulties in finding aggregation theorems with reasonable assumptions. Of course, we can estimate individualistic models and then aggregate. But there is only so much information in any given dataset. If it is used to estimate a multitude of parameters of individualistic models, it does not follow that the prediction following aggregation is better than a prediction from a population (or representative agent) model with far fewer parameters. We will address this pertinent empirical question. We focus on the class of rule learning models of Stahl, 1996, Stahl, 1997, Stahl, 1999 and Stahl, 2000 (hereafter S96, S97a,b, and S99). This is a rich class of learning models that encompasses action reinforcement Roth and Erev, 1995 and Erev and Roth, 1998, fictitious play (Brown, 1951), and belief updating Mookherjee and Sopher, 1994, Camerer and Ho, 1999a and Camerer and Ho, 1999b.4 Briefly, a “rule” is a mapping from the game and history of play to a mixed strategy. For example, a noisy best response to the recent past is a Cournot-like rule that describes much of the behavior observed in experiments. Iterating once more we have a “level-2” rule that is a noisy best response to the best response to the recent past. Complicating the econometric estimation of rule learning models is the fact that the rule used by an individual is not directly observable — only the action taken is observable — and in any model with properly specified error structures all rules will have full support on the available undominated actions. In an individualistic model of rule learning (S97b), the posterior probability of the rule conditional on the history was computed, but the computational complexity necessitated the use of precarious approximations. This problem can be potentially avoided by a population learning model because the experience of many individuals using and evaluating different rules gets merged into the population experience, so in essence it is as if the population evaluates all the rules. In Section 2 we review the individual rule learning model of S97a, spell out aggregation of that model, and develop a population version of rule learning. Section 3 describes the experimental design and data, and Section 4 describes the econometric specification and computational issues. Section 5 presents the results, and Section 6 discusses our findings.

نتیجه گیری انگلیسی

5. Results We estimated a homogeneous model (one nine-parameter β vector for all four experimental sessions), and we also estimated separate models for each session. The maximized log-likelihood value for the homogeneous model is −3097.01, while the sum of the four maximized log-likelihood values for the session models is −3062.45. Twice the difference is distributed chi-square with 27 (3×9) degrees of freedom and has a P-value of 10−5. Therefore, we reject the hypothesis that the four sessions come from the same distribution.8 Henceforth, we will report on only disaggregated session-by-session results. 5.1. Population model versus aggregation of individual models The session-by-session maximized log-likelihood values, LA(β), of the population rule learning model are given in the third column of Table 1 next to the computed values of AL(View the MathML source) based on S97a. It is immediately apparent that there is little difference in the log-likelihood values both by session and aggregated over all sessions. Therefore, the slight overall improvement in the aggregated log-likelihood provided by aggregation does not appear to be worth the tremendous increase in the number of parameters: (9×91)−(9×4)=783. One method of comparing log-likelihood values of non-nested models with different number of parameters is the Akaike’s information criteria, which is twice the log-likelihood difference less twice the difference in the number of parameters. This measure is given in the fourth column of Table 1. Clearly according to the AIC, the population model is far superior. Modifications of the AIC that have been suggested in the literature (Bozdogan, 1987) only serve to enhance the effect of the difference in the number of parameters. Therefore, we have the following result. Result 1. Taking account of the difference in the number of parameters, the population rule learning model is superior to aggregation of the individual rule learning model.9 The coefficient estimates of the population rule learning model are given in Table 2. It is noteworthy that the initial weights on level-2 evidence (View the MathML source) and Nash evidence (View the MathML source) are insignificant from zero, whereas the initial weight on level-1 evidence is highly significant.10 5.2. Rule learning hypotheses Two parameters are critical to rule learning: β0 and β1. If β0=1, then by virtue of law of motion, Eq. (3’), the population distribution of rule propensities would be constant for all periods. The test reported in Table 2 for β0 is with respect to β0=1, and this null hypothesis is rejected for three out of the four experimental sessions. Aggregating all sessions, the null hypothesis has a P-value less than 10−7. Therefore, we have the following result. Result 2. We strongly reject the hypothesis of constant population rule propensities. While rule propensities apparently change over time, they respond to the performance evaluation function if and only if β1 is significantly positive. From Table 2, it can be seen that the null hypothesis of β1=0 is rejected for two of the four sessions. Aggregating all sessions, the null hypothesis has a P-value less than 10−7. Therefore, we have the following result. Result 3. We reject the hypothesis of no rule learning. If we had only two dimensions or only a small number of rules, then we could easily present a potentially revealing plot of rule propensities ϕ over time. However, with five dimensions, it is a challenge to present a picture of how the probability distribution over rules (ϕ) changes over time. For each period we identified the “dominant mode” of ϕ as follows. We found all the rules (ν,θ) for which ϕ(ν,θ,t) was within 50% of the maximum ϕ value for that period, and computed the average, View the MathML source, over this neighborhood of the dominant mode; call this (Vt0, Vt1, Vt2, Vt3, Θt) for period t. Fig. 2 and Fig. 3 display these mean evidence weights for the dominant mode as a function of time for the two sessions for which β1 was statistically significant. For S627, the transference parameter (τ) is essentially zero, so the Vk values revert to the initial values at the beginning of the second run; while for S810, τ≈1, so the Vk values are constant between runs. Note that an increase of one unit on the log scale means a four-fold increase in the weight on the corresponding evidence. Hence, these figures reveal substantial changes in ϕ due to rule learning, especially for the level-2 and Nash rules. In Fig. 2, the weight on level-l evidence increases throughout the first run and during the first half of the second run, when the weight on level-2 and Nash evidences increases dramatically. In Fig. 3, little happens during the first 10 periods, but then the weights on level-2 and Nash evidences steadily increase. 5.3. Nash and Cournot hypotheses As a benchmark for the population rule learning model, we can consider the Nash equilibrium model. Of course, in its pure form, it is incompatible with the data because participants often make non-Nash choices. It is more interesting to consider the Nash model extended to include errors. Observe that by setting νk=0,∀k≠3, we have a Nash-based probabilistic choice function, with the interpretation of ν3 as the precision of the population’s expected utility calculation or as the inverse of the variance of the population’s idiosyncratic considerations. The hypothesis that the population makes its choices according to this error-prone Nash model is nested within our full population model as a seven-parameter restriction. For each session, we found the (View the MathML source) values that maximized the log-likelihood of the population choices. The sum over all sessions of these maximized L values was −4087.78. Compared with the totally random prediction (−4393.75), this is a significant improvement (P<10−126). However, the full population model (−3062.44) is a very significant improvement over this Nash model (P<10−417). In other words, even after adjusting for the large number of parameters, the population rule learning model is astronomically more likely to have generated the data than the Nash-based model. (For an enhanced Nash model with learning which is also rejected, see Appendix A.) Result 4. We strongly reject the implicit restrictions of the Nash model. So-called Cournot dynamics have been popular because of their simplicity and explanatory power (e.g. Van Huyck et al., 1994, Cheung and Friedman, 1997 and Friedman et al., 1995). In our context, Cournot dynamics is equivalent to zero weight on all evidence except View the MathML source, and no rule learning. Thus, the reduced model would have only two parameters (View the MathML source). Maximizing the log-likelihood function with respect to these two parameters for each session and summing over all four sessions, the aggregated log-likelihood decreases to −3321.66. Compared to the no-rule-learning model, twice the difference is distributed chi-square with 12 degrees of freedom and has a P-value less than 10−91. Compared to the full population model, twice the difference is distributed chi-square with 28 degrees of freedom and also has a P-value less than 10−91. Thus, we can strongly reject the Cournot model in favor of both the no-rule-learning model (but other rules present) and the full population model. (For an enhanced Cournot model with learning which is also rejected, see Appendix A.) Result 5. We strongly reject “Cournot dynamics”.11