یادگیری مبتنی بر مورد تقویت برای کنترل موجودی پویا در یک سیستم زنجیره عرضه چند عامل
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|5387||2009||7 صفحه PDF||سفارش دهید||4420 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 36, Issue 3, Part 2, April 2009, Pages 6520–6526
Reinforcement learning (RL) appeals to many researchers in recent years because of its generality. It is an approach to machine intelligence that learns to achieve the given goal by trial-and-error iterations with its environment. This paper proposes a case-based reinforcement learning algorithm (CRL) for dynamic inventory control in a multi-agent supply-chain system. Traditional time-triggered and event-triggered ordering policies remain popular because they are easy to implement. But in the dynamic environment, the results of them may become inaccurate causing excessive inventory (cost) or shortage. Under the condition of nonstationary customer demand, the S value of (T, S) and (Q, S) inventory review method is learnt using the proposed algorithm for satisfying target service level, respectively. Multi-agent simulation of a simplified two-echelon supply chain, where proposed algorithm is implemented, is run for a few times. The results show the effectiveness of CRL in both review methods. We also consider a framework for general learning method based on proposed one, which may be helpful in all aspects of supply-chain management (SCM). Hence, it is suggested that well-designed ‘‘connections” are necessary to be built between CRL, multi-agent system (MAS) and SCM.
Supply-chain management (SCM) has been providing competitive advantages for enterprises in the market. In that, inventory control plays an important role and has been attracting attentions from many researchers in recent years. Some known inventory control policies are studied and improved for all aspects, such as reduced cost, more flexibility. Chen, Li, Marc Kilgour, and Hipel (2006) introduce a case-based multi-criteria ABC analysis, that improves on this approach by accounting for additional criteria, such as lead time and criticality of SKUs. This procedure provides more flexibility to account for more factors in classifying SKUs. Lee and Wu (2006) propose the statistical process control (SPC) based replenishment method, in which inventory rules and demand rules are developed to determine the amount of order replenishment for solving order batching problem. This control system performs well at reducing backorders, and bullwhip effect. Yazgı Tütüncü, Aköz, Apaydın, and Petrovic (2007) present new models for continuous review inventory control in the presence of uncertainty. The optimal order quantity and the optimal reorder point are found to minimize the fuzzy cost. On the other hand, different inventory management systems could be designed according to a specific industry or environment. Aronis, Magou, Dekker, and Tagaras (2004) apply Bayesian approach to forecasting the demand for spare parts of electronic equipment, providing a more accurate determination on stock level for satisfying negotiated customer service level. Ashayeri, Heuts, Lansdaal, and Strijbosch (2006) also develop cyclic production-inventory optimization models for the process manufacturing industry. ElHafsi (2007) shows that optimal inventory allocation policy in an assemble-to-order system is a multi-level state-dependent rationing policy. Díez, Erik Ydstie, Fjeld, and Lie (2008) design model-based controllers based on discretized population balance (PB) models for particular processes, which are encountered in almost any branch of process industries. Kopach, Balcıoğlu, and Carter (2008) revisit a queuing model and determine an optimal inventory control policy using level crossing techniques in blood industry. Meanwhile, identifying factors affecting inventory management performance such as cost and demand also assists in designing the controllers. Andersson and Marklund (2000) introduce a modified cost structure at the warehouse, and then multi-level inventory control problem can be decomposed to single-level problems. By applying a simple coordination procedure to them, the near optimal solution is obtained. Zhang (2007) studies an inventory control problem under temporal demand heteroscedasticity, which is found to have a significant influence on firm’s inventory costs. Chiang (2007) uses dynamic programming to determine the optimal control policy for a standing order system. Yazgı Tütüncü et al. (2007) make use of fuzzy set concepts to treat imprecision regarding the costs and probability theory to treat customer demand uncertainty. Additionally, Maity and Maiti (2007) devise the optimal production and advertising policies for an inventory control system considering inflation and discounting in fuzzy environment. It is observed that in recent researches mentioned, mathematical or analytical models are preferred, such as Bayesian approach (Aronis et al., 2004), Utility Function Method (Maity & Maiti, 2007), fuzzy set concepts (Yazgı Tütüncü et al., 2007) and Autoregressive and Integrated Moving Average and Generalized Autoregressive Conditional Heteroscedasticity (Zhang, 2007). This kind of method provides strict deduction, which usually involves complicated notations and equations under assumptions. However, on one hand, the problem may be time-varying under dynamic environment, especially in the evolving system like supply chain where the solution in one time may be not suitable for another time. On the other hand, those models are too difficult for managers to implement in the real enterprises because of the complicated calculations involved. This requires the learning ability to enrich one’s experience continuously in order to make reasonable decisions. Reinforcement learning (RL) is an approach to machine intelligence that combines the fields of dynamic programming and supervised learning to yield powerful machine-learning systems (Harmon & Harmon, 1996). Chi, Ersoy, Moskowitz, and Ward (2007) demonstrate and validate the applicability and desirability of using machine learning techniques to model, understand, and optimize complex supply chains. To make the best use of learning methods, intelligent entities are the necessary carriers. Multi-agent Systems (MAS) seem to be a good choice where the agents are characterized of intelligence, autonomy, interactive and reactivity. Liang and Huang (2006) develop a multi-agent system to simulate a supply chain, where agents are coordinated to control inventory and minimize the total cost of a supply chain. Govindu and Chinnam (2007) propose a generic process-centered methodological framework for analysis and design of multi-agent supply chain systems. Therefore, this paper proposes a reinforcement learning algorithm combined with case-base reasoning (CRL) in a multi-agent supply-chain system. Similar research is carried out by Kwon, Kim, Jun, and Lee (2007). They suggest a case-based myopic reinforcement learning algorithm for satisfying target service level using vendor managed inventory model. And in this paper, we are trying to provide a simpler learning method with similar or better performance, which could be used more widely and easier to implement by managers. Furthermore, the ‘‘connections” are strongly recommended to be built between CRL, MAS and SCM, thus a generic reinforcement learning method is also suggested. The remainder of this paper is organized as follows. Section 2 explains the multi-agent supply-chain model including the inventory control problem. Section 3 presents the CRL algorithm in more detail. Simulation environment for measuring the performance of CRL is explained and the results are presented in Section 4. Section 5 considers a generic RL method based on the proposed one. Finally, the conclusion and future research are provided in Section 6.
نتیجه گیری انگلیسی
In this paper, the problem of dynamic inventory control for satisfying target service level in supply chain with nonstationary customer demand is studied. The case-based reinforcement learning is applied and proved experimentally to be effective. Furthermore, the general CRL is considered for the purpose of applying CRL widely in supply chain. Hence, the thinking behind this paper is to link CRL to supply-chain management where multi-agent system (MAS) is necessary. The future researches may reside in two directions. One direction is to extend the CRL to a multi-stage multi-agent supply chain, so that bullwhip effect may be observed and reduced. The other direction is to apply CRL to other issues in supply chain such as trading competition.