کنترل سیگنال ترافیک تطبیقی با استفاده از برنامه ریزی پویا تقریبی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|25496||2009||19 صفحه PDF||سفارش دهید||11198 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Transportation Research Part C: Emerging Technologies, Volume 17, Issue 5, October 2009, Pages 456–474
This paper presents a study on an adaptive traffic signal controller for real-time operation. The controller aims for three operational objectives: dynamic allocation of green time, automatic adjustment to control parameters, and fast revision of signal plans. The control algorithm is built on approximate dynamic programming (ADP). This approach substantially reduces computational burden by using an approximation to the value function of the dynamic programming and reinforcement learning to update the approximation. We investigate temporal-difference learning and perturbation learning as specific learning techniques for the ADP approach. We find in computer simulation that the ADP controllers achieve substantial reduction in vehicle delays in comparison with optimised fixed-time plans. Our results show that substantial benefits can be gained by increasing the frequency at which the signal plans are revised, which can be achieved conveniently using the ADP approach.
Operating traffic signals in urban areas requires proper timings, so that varying demands can be managed effectively. Conventional algorithms, which are optimised off-line, usually generate a library of signal timing plans each with fixed stage duration and sequence. Plans are retrieved from the library for implementation according to the time of day and the day of week. Such plans require manual maintenance and updating, otherwise the performance declines at a rate of about 3% per year (Bell and Bretherton, 1986). Most of the operating signal systems today are traffic responsive (or vehicle actuated). The responsiveness to traffic is that the allocation of green times is adjusted according to real-time traffic information. The real-time traffic data are usually detected by using inductive loops. While bringing substantial benefits in reducing vehicle delays and stops, responsive systems are usually constrained by preset control parameters, such as cycle length and stage sequence. Ideally, for optimisation over time, the controller should operate signals regardless of cycle and stage constraints. This requires a dynamic controller to act according to detected traffic and update its control parameters online without human intervention, thus being adaptive. Dynamic programming (DP) developed by Bellman (1957) is so far the only exact solution for optimisation over time. It decomposes a control problem to a series of sub-problems which we denote as step, which corresponds to discrete segments of time in real-time control problem. Associated with each step is a set of state variables that give information on the controller and the traffic environment at that time. The DP recursively calculates Bellman’s equation backwards step-by-step to find the optimal action that transfers the system from the current state to a new state. In doing this, the DP generates backwards in time a sequence of optimal actions that guarantee global optimality. The DP solutions to traffic signal control are studied in Robertson and Bretherton (1974) and Gartner (1983). The results show that using DP can reduce about 56% vehicle delays from the best fixed-time plans. Nevertheless, the DP’s implication for real-time traffic signal control is limited. The computational demand in the recursive calculation of Bellman’s equation is exponential to the size of the state space, the information space and the action space. This scenario is often described as the ‘Three Curses of Dimensionality’ ( Powell, 2007). Furthermore, the DP requires a complete information on the time period in which the controller seeks optimisation. In real-time operation, however, traffic detectors may supply only 5–10 s data of future arriving vehicles. To overcome the difficulties in applying DP and to preserve the fundamental features of dynamic control, a favourable option is approximation. An approximation to DP usually aims to reduce state space by replacing a look-up table of state values with aggregations or a continuous approximation function. Such an approach is frequently denoted as approximate dynamic programming (ADP). In this paper, we limit the study to continuous approximation function only. Since we may not know the appropriate values of functional parameters a priori, it is preferable that the controller acquires adaptive features that update the parameters according to both the changes in the prevailing traffic and the observation of the controller’s interaction with the traffic environment. An approach that applies the fundamentals of dynamic programming to learn from interactions with the environment is reinforcement learning. This approach uses the dynamic programming formula to map the system state to action. The action changes the environment, and this change is communicated back to the controller through a scalar reinforcement signal. The functional parameters are updated by specific learning techniques upon receiving the reinforcement signal. In this study, we investigate two learning techniques, temporal-difference and perturbation learning. In this paper, we show that an adaptive traffic signal controller using ADP and reinforcement learning is capable of reducing vehicle delays substantially from the best fixed-time control, while being computationally efficient. The numerical experiments presented here are limited to an isolated intersection with multiple signal stages, but the implication extends to distributed control in traffic networks. This paper is organised as follows. In Section 2, we review the existing traffic signal control systems, from which we identify the scope for development and set objectives for the ADP controller. In Section 3, we introduce the fundamentals of the ADP, reinforcement learning and the specific learning techniques, based on which the control algorithms are formulated for traffic signals. Section 4 contains numerical experiments and results. Section 5, contains conclusion of this study and the scope for future research.
نتیجه گیری انگلیسی
This study investigates the application of approximate dynamic programming (ADP) to the field of traffic signal control, aiming to develop a self-sufficient adaptive controller for online operation. Through the review of existing traffic signal control systems, we identify the objectives for this study as providing dynamic control for real-time operation, being adaptive to changing traffic by using online learning techniques, and frequent review of signal timing plans. We show in the numerical experiment that the ADP controllers meet all of the objectives. The key feature of the ADP approach is to replace the true value function in dynamic programming (DP) with a linear approximate function. The initial approximation is updated progressively by using specific learning techniques that adjust the function parameters in real-time. We have shown in the numerical examples that operating at a resolution of 5-s per time increment, the ADP controller reduces delay from 13.95 vehicle-second per second (v.s/s), which is the result from TRANSYT plans to 8.64 v.s/s. At the resolution of 0.5-s per time increment, the ADP controller reduces vehicle delay to 4.62 v.s/s. To provide an absolute lower bound in comparison, we provide results from DP after enduring a costly computation process, and the averaged result is 4.27 v.s/s under the same traffic condition that applies to the ADP controllers. The result from the ADP controllers at the finer resolution is just 0.4 v.s/s more than that of DP in average, whereas the time the ADP controller takes to complete an hour’s simulation is only about 0.08% of the time the DP approach takes to complete 1/10 of an hour’s simulation. The ADP controllers only use 10-s information of future arriving traffic in evaluating optional decisions. These results suggest that the ADP controller can achieve a large proportion of the benefits of DP, while being adaptive to the changing traffic and computationally efficient. The ADP approach is therefore a practical candidate for real-time signal control at isolated intersections. Two learning techniques, i.e. temporal-difference (TD) reinforcement learning and perturbation learning, are investigated in this study. The TD method constantly tracks the different between current estimation and actual observation of state values, and propagates the difference back to the functional parameter so as to update the approximation. Perturbation learning directly estimates the gradients of the approximate function by giving a perturbing signal to the system state. Despite of the different learning methods, the learning effects are broadly similar, and no statistical difference can be found in numerical experiments. The ADP controller with either of the two learning techniques produces the best performance by discounting future delay at about 12% per time increment at the finer resolution, suggesting limited influence of the approximation function in evaluating control decisions. Combining the equality between two learning techniques with the limited influence, it suggests that a simple linear approximation is sufficient for the ADP controller to operate in real-time. Exploring more complex approximations may not prove cost effective. The general formulations of the ADP controller presented in this study can be readily extended to more complicated intersections without substantial difficulty. However, the norm of contemporary traffic signal operation in urban areas is a coordinated network system, and achieving system-wide optimality is a common objective of these systems. This study has so far focused on isolated intersection only. Given the advantages of ADP controller shown in this study and its conformity to real-time operation, we identify the distributed network operation presented in OPAC and PRODYN as a good starting point for introducing ADP to traffic network. A challenging issue here is how to construct traffic state and controller state of adjacent intersections in the local objective function, and how the local controller could explore the power of ADP to learn from network operation.