یادگیری تطبیقی مدل های رفتار مصرف کننده
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|1791||2007||21 صفحه PDF||سفارش دهید||1 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Journal of Economic Behavior & Organization, Volume 64, Issues 3–4, November–December 2007, Pages 348–368
In a model of dynamic duopoly, optimal price policies are characterized assuming consumers learn adaptively about the relative quality of the two products. A contrast is made between belief-based and reinforcement learning. Under reinforcement learning, consumers can become locked into the habit of purchasing inferior goods. Such lock-in permits the existence of multiple history-dependent asymmetric steady states in which one firm dominates. In contrast, belief-based learning rules must lead asymptotically to correct beliefs about the relative quality of the two brands and so in this case there is a unique steady state.
Adaptive learning models attempt to describe the behavior of agents faced with repeated decision problems by assuming they use simple learning rules. These models are used in a number of apparently disparate environments. Economic theorists have analyzed them in abstract settings.1 They have been fitted to actual choice data both in economic experiments and the quite different context of the empirical analysis of consumer behavior.2 Despite differences in aims and terminology, some models of dynamic choice found in empirical marketing analysis are essentially the same as those used in economic theory. This research in marketing supports the experimental evidence that even simple adaptive learning models can help to explain human behavior. In the context of econometric work on experimental data, there has been an active debate as to whether the more sophisticated belief-based models or very simple reinforcement learning models offer the better fit. Up to now, this has been of interest because it throws light upon human reasoning processes. However, if the same question is considered in the context of consumer choice, there may be significant practical implications to consider as well. This paper investigates the hypothesis that whether consumer behavior is best described by belief-based or reinforcement learning may have a significant impact on market organization. In particular, we examine a model of dynamic duopoly where consumers learn about the relative quality of the two different brands. The product is an experience good and so information is partial: consumers only learn the payoff to the good they actually consume. First, we investigate a reinforcement type learning model where more familiar products have a greater probability of being selected. Consequently, consumers can become locked into inferior choices. Such lock-in permits the existence of multiple history-dependent steady states. When multiple steady states exist, even if the two firms are identical in terms of costs and product quality, the symmetric outcome is unstable: one firm must dominate. This outcome under reinforcement learning is then contrasted with the outcome under belief-based learning. This form of learning leads to correct beliefs about relative quality even under partial information. Firms can influence consumer opinion only in the short run: if consumers’ initial estimate of a firm’s quality is high (low), it has an incentive to charge above (below) the myopic price in order to slow (speed up) learning. Given the convergence of beliefs to the unique correct outcome, the firms must converge to a unique steady state where prices are the same as under complete information. This paper, therefore, shows that the small differences in the learning rules, between belief-based and reinforcement learning, can have dramatic effects on market outcomes. The situation to be modelled can be thought of as a consumer going on a regular basis to a supermarket to buy a grocery item and choosing between two competing brands. This type of decision has several aspects which I would like to emphasize. First, the prices for the competing brands are usually clearly marked on the shelves. Thus, the learning the consumer has to undertake is not about prices or their distribution. However, the goods in question are typically experience goods. One has to take them home and consume them before their quality is known. Second, quality in this context is very often subjective and imprecise, for example, whether a food product tastes good. Third, because each successive purchase decision is relatively unimportant to an individual consumer, a model of boundedly rational behavior may explain actual choices well. Such boundedly rational agents may have a impression of quality that is ambiguous and difficult to measure against past experience. As a consequence, it may be very difficult to be confident about relative quality. For example, I think I like the brand I bought today, but is it clearly better than the one I bought last month? Indeed, in this paper it is assumed that the consumption experience is noisy and memory is imperfect. The formal model of price competition analyzed here is derived from that of Chintagunta and Rao (1996), who similarly consider a dynamic duopoly with adaptive consumers. Their work is quite distinctive from most of the literature in economics on learning. First, there is the mixture of rational behavior by sellers and reinforcement learning by boundedly rational buyers. Second, while the recent literature on adaptive learning has largely focused on abstract exogenous environments, Chintagunta and Rao’s work is also empirical. The model is fitted to data on actual prices, sales and consumer purchases. They find, for example, that a dynamic specification, taking into account consumers’ past purchases, outperforms a static logit model. This result, as I argue in Section 7 of this paper, provides some support for the hypothesis that learning is in fact suboptimal. Nonetheless, the difference between this current paper and the work of Chintagunta and Rao is large, and reflects the difference between economics and marketing science. First, the principal question here is one of welfare: do consumers learn to make correct choices, and what is the implication that has for the competitiveness of the resulting market structure? In contrast, Chintagunta and Rao’s main objective, as with much marketing analysis, is to predict consumer choice. Second, Chintagunta and Rao do not investigate whether the reinforcement learning rule they specify would lead a consumer to choose the brand which she would prefer in the case of perfect information. We show that frequently this will not be the case. Third, Chintagunta and Rao, in characterizing the dynamic pricing equilibrium, did not identify, as is done here, that there may be multiple steady states. Finally, in their paper only one learning rule is considered. Here, the results under reinforcement learning are contrasted with those resulting under belief-based learning, thereby demonstrating that it is familiarity-based learning that is responsible for the pathological outcome, and not the situation of experience goods in itself. This latter point is also what differentiates the current work from an earlier literature, Schmalensee (1978) and Smallwood and Conlisk (1979), that concentrates exclusively on simple forms of reinforcement learning. In these models also, consumers do not necessarily learn which is the highest quality brand. The contribution here is to clarify the conditions on the form of consumer learning under which this is possible. Furthermore, in the last few years, the analysis of experimental data has shown the effectiveness of adaptive learning models in predicting subject behavior. This, combined with the empirical marketing work of Chintagunta and Rao (1996) and Ho and Chong (2003), offers the intriguing prospect of estimating consumer learning models with actual consumer choices. Given the theoretical differences between reinforcement and belief-based learning, in Section 7, I offer two empirical tests that potentially could distinguish between the two models. The focus on bounded rationality differentiates this paper from most previous literature on dynamic pricing of experience goods that has assumed fully rational consumers.3 One strand of the existing literature is based on the quality of the good being private information to the seller. The consumer then learns about product quality by making highly sophisticated inferences from the resulting strategic behavior of firms. Depending on the model and/or the parameters of a single model, a seller can signal that the quality of her good is higher than the alternative by charging a price that is either higher or lower than the price that would be myopically optimal Milgrom and Roberts, 1986 and Bagwell and Riordan, 1991. Here, consumers can only learn if a brand is of high quality through repeated consumption experience. Bergemann and Välimäki (1996) is much closer in that it examines the effect of strategic pricing on the rate of information acquisition by a buyer. However, it is quite different in that the buyer’s behavior is given by the solution to a stochastic dynamic optimization problem, allowing for an optimal level of experimentation. Strategic and adaptive models may well be complementary, with different models doing better in different circumstances. For example, Bergemann and Välimäki (1996), to motivate their model of optimal learning, give the example of a factory manager choosing which production technology to buy. Indeed, such professional decision-makers faced with sharp incentives may well be well-described by optimal learning models. Adaptive models, on the other hand, may do well in those consumer markets where a single purchase represents very small stakes. Indeed, this paper is not the first to apply learning models to consumer behaviour. Weisbuch et al. (2000) and Kirman and Vriend (2001), in work that is very close to the present approach, analyze adaptive learning models of consumer behavior and similarly examine conditions under which consumers become loyal to one seller. The principal difference is that here firms are forward looking and price dynamically. Erev and Haruvy (2001) also consider the implications of adaptive learning by consumers, but again firms have fixed pricing policies.4Ellison and Fudenberg (1995) look at social learning, where agents learn from the experience of others as well as from their own. Certainly researchers in consumer behavior have found it plausible that consumers may be prone to a number of cognitive biases; see for example Erdem et al. (1999). One which seems particular relevant in this context is “confirmatory bias”. As Rabin and Schrag (1999) discover, there is substantial psychological evidence that once individuals form a hypothesis, they pay greater attention to subsequent evidence that supports that hypothesis than to evidence that is non-supportive. In the current context, this would suggest that consumers may be relatively unwilling to switch away from a favored brand. This paper highlights a source of bias that is even more basic, but which has significant implications for market organization. Imagine a consumer who initially has greater goodwill towards brand X than the rival brand Y. So, all other factors being equal, she will mostly purchase X and only rarely sample Y. Now, suppose her choice decision is based on beliefs: estimates of the relative quality of the two brands. Then, the low frequency of purchase of Y will not matter in the long run, as eventually she will accumulate a sufficient number of observations to gain a clear picture of the average quality of Y, and if this is higher than that of X, she will switch allegiance. In contrast, suppose a consumer chooses on the basis of familiarity, a stock of goodwill. Then, in the intervening time between purchases of Y, when he is consuming X, that stock of goodwill towards Y diminishes; he forgets about it. The probability of buying Y falls. He may then never accumulate sufficient positive experience to realize that in fact Y is just as good, or even superior. That is, quite subtle difference in mental attitude toward choices not made, products not consumed, can have quite profound effects on long run outcomes. The result is that, when consumers are reinforcement learners, possession of a high initial market share is self-reinforcing. There is an extensive literature in industrial organization concerned with the origins of market dominance. Recent theoretical explanations for sustained dominance include network effects, increasing returns to scale and learning by doing. Here none of these factors are present, but there is still lock-in. This, however, is broadly consistent with the empirical findings of Sutton (1991) on industries in the food sector, where some outcomes seem history dependent in industries, without network externalities, but where consumer tastes, loyalty and perceptions of quality are important. In examining dynamic oligopoly, there is the question of which equilibrium concept to use. Open loop equilibrium, as used by Chintagunta and Rao, earlier by Schmalensee (1978) and more recently, for example, by Cellini and Lambertini (1998), has the advantage of analytic simplicity. It is true that many researchers in industrial organization prefer Markov perfect/closed loop equilibrium, despite the fact that, except in simple linear-quadratic models, its complexity precludes analysis except by numerical methods. In contrast, the open loop equilibrium can be analyzed qualitatively, revealing much information such as the number of steady states and their stability. Furthermore, the known disadvantage of open loop equilibria, that it in effect allows commitment to a complete strategy path, are limited here as firms compete on price not quantity. That is, the fact that the open loop equilibrium in the model analysed here has asymmetric steady states cannot be attributed to a Stackleberg phenomenon, where one firm obtains dominance simply by committing to a high level of output. Finally, Markov perfect equilibrium is useful for the analysis of how firms respond to stochastic realisations of demand or other variables. But in the current model, while the evolution of individual consumer behavior is stochastic, there are no aggregate shocks, so it is possible that a deterministic approximation, averaging over a large number of consumers, can capture the essentials of consumer behavior. In turn, open loop equilibrium may be a reasonable approach. This is discussed further in Section 6.
نتیجه گیری انگلیسی
This paper explores the consequences of recent advances in adaptive learning theory for the analysis of consumer behavior. The case of experience goods corresponds to partial information in the learning literature. Two different models of learning are compared in this setting. The first, a model of reinforcement learning, may be biased with consumers becoming locked into inferior choices. This leads to the possibility of multiple steady states. When there are multiple states, the stable ones are those which involve dominance by one firm. Under a model of belief-based learning due to Sarin and Vahid, consumers will learn accurately in the long run, and so there is only one long run equilibrium. However, in the short run a seller has an incentive to charge a price different from the myopic maximum to affect the speed at which consumers learn. Whether these different models can be separated empirically is an interesting question. The availability of consumer scanner data now permits investigation by the examination of individual consumer behavior. In Section 7 of this paper, two simple tests for the identification of different types of learning behavior were suggested. Some of the existing empirical evidence, both from the laboratory and field consumer data, seems to give greater support for the reinforcement learning model that predicts suboptimal behavior even in the long run. The market outcomes under reinforcement learning are probably best interpreted as an important first mover advantage. Familiarity with an existing brand will make the establishment of an alternative difficult, even if it is higher quality, at least under price competition. It is an open question whether there would be a different conclusion if other forms of competition were included. For example, it has long been asserted that certain forms of advertising convey no information, but only serve to aid familiarity. Thus, the investigation of the effect of advertising when consumers are reinforcement learners seems a natural complement to the current research. The assumptions and methodology employed in this paper are quite different from those of the strategic approach to dynamic pricing. It would be interesting to analyze the robustness of the two types of model. In particular, both the assumption that all consumers can act as though they understand the intuitive criterion and the present alternative, that all consumers are incapable of any strategic inference, seem extreme. Some heterogeneity amongst consumers would seem more reasonable. For example, how would the current results change if a proportion of consumers were sophisticated rather than adaptive? Or, for example, can one successfully signal high quality when such a signal is simply not understood by a proportion of its intended audience? As a final remark, the existing experimental evidence, for example, Cooper et al. (1997), as well as supporting heterogeneity, suggests that adaptive learning does better than equilibrium refinements at explaining actual human behavior.