ترجمه فارسی عنوان مقاله

یادگیری نمایندگی در مدل های انتخاب تامین کننده

عنوان انگلیسی

Agent learning in supplier selection models

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
19115	2005	22 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Decision Support Systems, Volume 39, Issue 2, April 2005, Pages 219–240

ترجمه کلمات کلیدی

مدیریت تامین کننده - آموزش تقویت - آموزش عامل - آسیبهای اخلاقی - قراردادهای ناقص

کلمات کلیدی انگلیسی

Supplier management, Reinforcement learning, Agent learning, Moral hazard, Incomplete contracts,

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

We use agent-based modeling to study the performance of a supplier selection model, originally proposed by Croson and Jacobides [Small Numbers Outsourcing: Efficient Procurement Mechanisms in a Repeated Agency Model, Working Paper #99-05-04 Department of Operations and Information Management, The Wharton School of the University of Pennsylvania (1999)], which displays a complicated reward and punishment profile under incomplete information. We document the dynamics and convergence to equilibrium of the interactions of a single buyer with a heterogeneous group of sellers, which results in both separation of sellers capable of producing high-quality goods from those incapable of doing so, and continuing incentives for high-quality-capable sellers to produce at the maximum quality possible. We model two methods of determining exploration reference points—an “auction-style” model focusing on probability of success and a “newsvendor-style” model focusing on profitability. Our simulation shows that (1) the tournament structure suffices to reach convergence at high-quality levels whenever the number of suppliers exceeds three, (2) punishment length and number of suppliers are substitutes, and (3) shorter punishments improve learning speed of convergence. Moreover, we show that it is strictly better for the buyer to transact with relatively few suppliers—a conclusion generated endogenously inside the model as a tradeoff between exploration and exploitation, rather than through assumptions that explicitly penalize supplier proliferation.

مقدمه انگلیسی

The twin challenges of how to choose capable suppliers and of how to motivate capable suppliers once they have been chosen are central problems in the management of the modern firm. In this paper, we study a game-theoretic supplier selection model originally proposed by Croson and Jacobides [12] where neither the suppliers nor the buyer possesses full information. We use agents to model suppliers who learn to produce at their optimal quality levels through a prespecified system of rewards and punishments administered by the buyer. We also demonstrate how the process of repeated optimality-seeking updates to the actions of the heterogeneous suppliers results in the sellers capable of producing high-quality goods distinguishing themselves from those sellers incapable of doing so (addressing adverse selection), while producing at the highest possible level of quality (addressing moral hazard). Furthermore, using multiple related simulation environments, we can address the question of “how many suppliers is it optimal to employ?” Supporting both previous theoretical literature [6], [7], [10] and [28] and corroborating evidence from the Japanese automotive market [1], [2] and [3], we show that it is optimal for the buyer to transact with relatively few suppliers, a conclusion generated endogenously inside the model rather than forced to happen through assumptions which penalize supplier proliferation per se. We also evaluate the results from the perspective of benefit to society and the costs incurred by the sellers. In Section 2, we discuss the intersection of the IS and supplier-management literatures. In Section 3, we present our motivation for carrying out an agent-based approach on a solved game-theoretic problem. Section 4 provides an overview of the specific outsourcing model on which we focus. In Section 5, we describe the precise agent-based technique we employ. Our experiments and results are summarized in Section 6. Section 7 concludes and provides suggestions for future work.

نتیجه گیری انگلیسی

In this paper, we use reinforcement learning to study the SNO model proposed by Croson and Jacobides [12] for supplier selection. We study the dynamics of high-quality and low-quality seller interactions and test whether sellers capable of producing high-quality goods can be distinguished from sellers capable of only producing low-quality goods; and be forced to produce at high-quality levels. Our results show that capable sellers, when three or more, do distinguish themselves from incapable sellers and produce at high levels of quality, with the rate of convergence being faster for smaller numbers of sellers (and weaker punishments) in our example. Buyer surplus was increasing in the number of sellers up to a point (four to nine capable sellers for our sample parameters); it reached a global optimum at eight suppliers with three-period punishments. These results are interesting and corroborate previous theoretical and empirical literature on outsourcing, as well as being consistent with casually observed evidence in the market (as documented in Ref. [7] that few buyers use more than four to five suppliers for the same item). We thus showed through simulation that it is optimal for the buyer to outsource to only a relatively small number of suppliers, as any more or fewer would result in a decrease in the buyer's total surplus. From a social perspective, it is similarly optimal for the buyer to outsource to only a few suppliers, as the deadweight loss from wasteful exploration increases exponentially with an increase in the number of high-quality suppliers. In future work, it would be interesting to explicitly incorporate the switching costs incurred by the buyer when she changes suppliers. Japanese automakers, do not experience very high switching costs in practice since the supplier base per se does not change: only the grade (or tier) to which the supplier belongs and the amount of margin left to the supplier changes, not the supplier's identity. US automakers (particularly in the 1980s) experienced high and frequent switching costs as they rotated suppliers to get lower prices [16]. We expect that an SNO model that incorporates a cost of bringing suppliers back from punishment periods would both deter the buyer from punishing for small transgressions (to avoid this re-entry cost) and make longer punishments credible when punishment occurred (to economize on the frequency with which this re-entry cost would be incurred for a given supplier). In this model, we have allowed only relative quality evaluation (i.e., Supplier 1's quality is higher than Supplier 2's) but supposed that this relative ranking was 100% accurate. We would like to extend this work to evaluate both cases of imperfect relative quality inspection (i.e., Supplier 1's quality is probably higher than Supplier 2's) or imperfect absolute quality inspection (i.e., Supplier 1's quality is probably high enough to meet standards, but Supplier 2's probably isn't). A closed-form theoretical model of such imperfect monitoring quickly becomes analytically intractable. [11] and [15] show conditions under which frequent or high-density sampling may reduce the agents' incentives to produce at high quality; consequently, the optimal punishment strategy for the buyer may be quite complex under imperfect observation. Similarly, as real-world buyer–supplier relationships always include some uncertainty, creating an agent learning model to optimize the reward–punishment structure under imperfect information could contribute new insights into the benefits and costs of using small-numbers outsourcing. Finally, in uncertain environments, it is not very practical for an individual (or a player) to expect to get good results from performing repeated static optimizations since the environment may evolve in a direction depending on the sequence of actions taken by the individual—thus causing the environment to behave in a non-stationary fashion, in which the individual has to perform dynamic optimization without an underlying model of the effects of his actions, a rather difficult task. Such an individual could still probe the environment (and its responses to his actions) by exploring different actions and trying to learn the best (near optimal) combinations of actions to balance exploration and exploitation. One promising approach in these situations is to use machine learning techniques and perform agent-based simulations of such environments to attempt to distill simple decision heuristics which are robust to environmental change.