مدل سازی سیستم برنامه ریزی مسیر بر اساس برنامه ریزی پویا مبتنی بر ارزش Q با الگوریتم های یادگیری تقویتی چندعاملی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|26130||2014||15 صفحه PDF||سفارش دهید||12200 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Engineering Applications of Artificial Intelligence, Volume 29, March 2014, Pages 163–177
In this paper, a new model for a route planning system based on multi-agent reinforcement learning (MARL) algorithms is proposed. The combined Q-value based dynamic programming (QVDP) with Boltzmann distribution was used to solve vehicle delay's problems by studying the weights of various components in road network environments such as weather, traffic, road safety, and fuel capacity to create a priority route plan for vehicles. The important part of the study was to use a multi-agent system (MAS) with learning abilities which in order to make decisions about routing vehicles between Malaysia's cities. The evaluation was done using a number of case studies that focused on road networks in Malaysia. The results of these experiments indicated that the travel durations for the case studies predicted by existing approaches were between 0.00 and 12.33% off from the actual travel times by the proposed method. From the experiments, the results illustrate that the proposed approach is a unique contribution to the field of computational intelligence in the route planning system. Graphical abstract Comparison of the proposed and the existing method results based on Q value-based dynamic programming with multi-agent reinforcement learning algorithms. Full-size image (19 K)
Route planning systems (RPS) are one of several types of traffic information systems that offer routes that solve traffic problems. RPS provides optimum route solutions and traffic information (Ji et al., 2012) prior to a trip in order to help drivers arrive at their destination as quickly as possible. Route planning problems can be solved by determining the shortest paths using a model of the transportation network (Kosicek et al., 2012 and Geisberger, 2011). The drivers of vehicles with different solutions available to them need quick updates when there are changes in road network conditions. Unfortunately, the task of efficiently routing vehicles in the route planning (Suzuki et al., 1995, Pellazar, 1998 and Stephan and Yunhui, 2008) has not been emphasized enough in recent studies. Multi-agent systems (MAS) are a group of autonomous, interacting entities sharing a common environment, which they perceive with sensors and upon which they act with actuators (Shoham and Leyton-Brown, 2008 and Vlassis, 2007). MASs are applied in a variety of areas, including robotics, distributed control, resource management, collaborative decision support systems (DSS), and data mining (Bakker et al., 2005 and Riedmiller et al., 2000). They can be used as a natural way of operating on a system, or they may provide an alternative perspective for centralized systems. For instance, in robotic teams, controlling authority is naturally distributed between the robots. Reinforcement learning (RL), which allows (Tesauro et al., 2006, Lucian et al., 2010 and Bakker and Kester, 2006) learning provides an environment for learning how to plan, what to do and how to map situations (Jie and Meng-yin, 2003) to actions, and how to maximize a numerical reward signal. In an RL, the learner is not told which actions to take, as is common in most forms of machine learning. Instead, the learner must discover through trial and error, which actions yield the most rewards. In the most interesting and challenging cases, actions affect not only the immediate rewards but also the next station or subsequent rewards. The characteristics of trial and error searches and delayed reward are two important distinguishing features of RL, which are defined not by characterizing learning methods, but by characterizing a learning problem. Any method that is suitable for problem solving is considered to be an RL method. An agent must be able to sense the state of the environment, and be able to take actions that affect the environment. The agent must also have goals related to the state of the environment. In other words, an RL agent learns by interacting with its dynamic environment. The agent perceives the state of the environment and takes actions that cause the environment to transit into a new state. A scalar reward signal evaluates the quality of each transition, and the agent must maximize the cumulative rewards along the course of interaction. RL system feedback reveals if an activity was beneficial and if it meets the objectives of a learning system by maximizing expected rewards over a period of time (Shoham and Leyton-Brown, 2008 and Busoniu et al., 2005). The reward (RL feedback) is less informative than it is in supervised learning, where the agent is given the correct actions to perform (Bussoniu et al., 2010). Unfortunately, information regarding correct actions is not always available. RL feedback, however, is more informative than un-supervised learning feedback, where no explicit comments are made regarding performance. Well-understood, provably convergent algorithms are available for solving single-agent RL tasks. MARL faces challenges that are different from the challenges faced by single-agent RL such as convergence, high dimensionality, and multiple equilibria. In route planning processes (Schultes, 2008 and Wedde et al., 2007), route planning should take into account traveler responses and even use these responses as its guiding strategy, and realize that a traveler's route choice behavior will be affected by the guidance control decision-making of route planning. On the other hand, the results of a traveler's choice of routes will determine network conditions and react to the guidance control decision-making of route planning (Dong et al., 2007 and Yu et al., 2006). Reduced vehicle delays could be achieved by examining several conditions that affect transportation network studying the weight of transport environmental conditions (Tu, 2008 and Zegeye, 2010). These conditions include: weather, traffic information, road safety (Camiel, 2011), accidents, seasonal cyclical effects (such as time-of-day, day-of-the-week and month) and cultural factors, population characteristics, traffic management (Tampere et al., 2008, Isabel et al., 2009, Almejalli et al., 2009, Balaji et al., 2007 and Chang-Qing and Zhao-Sheng, 2007) and traffic mix. Regarding these variables can be used to provide a priority trip plan to vehicles for drivers. Increasingly, information agent-based route planning (Gehrke and Wojtusiak, 2008) and transportation system (Chowdhury and Sadek, 2003) applications are being developed. The goal of this study is to enable multiple agents to learn suitable behaviors in a dynamic environment using an RL that could create cooperative (Lauer and Riedmiller, 2000 and Wilson et al., 2010) behaviors between the agents that had no prior-knowledge. The physical accessibility of traffic networks provided selective negotiation and communication between agents and simplified the transmission of information. MAS was suited to solve problems in complex systems because it could quickly generate routes to meet the real-time requirements of those systems (Shi et al., 2008). Our survey review indicated that MAS methods and techniques were applied in RPSs (Flinsenberg, 2004 and Delling, 2009) including dynamic routing, modeling, simulation, congestion control and management, traffic control, decision support, communication, and collaboration. The main challenge faced by RPS was directing vehicles to their destination in a dynamic real-time RTN situation while reducing travel time and enabling the efficient use of the available capacity on the RTN (Khanjary and Hashemi, 2012). To avoid congestion and to facilitate travel, drivers need traffic direction services. These services lead to more efficient traffic flows on all transport networks, resulting in reduced pollution and lower fuel consumption. Generally, these problems are solved using a fast path planning method and it seems to be an effective approach to improve RPS. For example, RPS is used in the following areas: traffic control, traffic engineering, air traffic control (Volf et al., 2011), trip planning, RTN, traffic congestion, traffic light control, traffic simulation, traffic management, urban traffic, traffic information (Ji et al., 2012 and Khanjary and Hashemi, 2012) and traffic coordination (Arel et al., 2010). The main thrust of using RPS in these situations is to compare the shortest path algorithms with Dijkstra's algorithm. Currently, there are various successful algorithms for shortest path problems that can be used to find the optimal and the shortest path in the RPS. The research will contribute to • Modeling a new route planning system based on QVDP with the MARL algorithm. • Using the ability of learning models to propose several route alternatives. • Predicting the road and environmental conditions for vehicle trip planning. • Providing high quality travel planning service to the vehicle drivers. This paper consists of seven sections. Section 2 describes the meaning of RPS based on MARL and related works. This will be followed by a definition of the RPS based on the MARL problem (Section 3). Section 4 will present the MARL proposed for RPS. Section 5 discusses the experimental method used in this study. Section 6 presents the results, comparisons, and evaluations. Finally, the paper is concluded in the last section.
نتیجه گیری انگلیسی
In this study, the circumstances for applying agent-oriented techniques that focus on the use of RL methods for vehicle routing problem was presented for traffic networks. For this purpose, we presented a conceptual framework for route planning systems that would route vehicles based on MARL. This framework identified the various components of the issue, by calculating traffic routes using a number of agents in a static network situation and extended all this to a real dynamic network in Malaysia. The important achievement of the study was to resolve the RPS problems using simulation methods and MASs with learning abilities, in order to make decisions about routing vehicles between Malaysia's cities. This study presented a new paradigm that included new RPSA and Q-values based on MARL for finding the optimal path to reduce traffic congestion and conduct the vehicles to their destinations in the RTN. It also introduced a conceptual model of RPS using MARL in the RTN as well as showing that agent learning technology can optimize RPS for RTN by reviewing agent applications for RTN optimization. Illustrating how a MARL can optimize the performance and demonstrating how a MARL is a coupled network of software learning agents that interact to solve RTN problems beyond the knowledge of each individual problem solving component were two further achievements obtained by this study. This research has also demonstrated that agent technology is suitable for solving communication concerns in a distributed RTN environment. The novelty of this study is the use of MARL for RPS, which can be employed by RTN in Malaysia to offer access to RTN data resources. MARL attempted to solve RTN problems by collaborating between agents, resulting in answers to complex RTN problems. In this study, each agent performed a special function of the RTN and shared its knowledge with other agents. Given the above described results, our contributions are as follows: 1. The research work modeled a new route planning system based on QVDP with the MARL algorithm in order to reduce vehicle trip times and costs by giving a priority trip plan to vehicles. 2. The research uses the ability of learning models to propose several route alternatives to reduce time and minimize travel costs during driving. 3. The paper is important for vehicle trip planning by deploying the MAS to predict the road and environmental conditions. 4. The study provides high quality travel planning service to the vehicle drivers. 5. The paper had results with enough sizes and dimensions that included three important issues (RPS, MARL, and RTN) in Computer Science.