برنامه ریزی پویا تکرار تصادفی: یک روش مونت کارلو برای کنترل دوگانه
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|24888||2005||12 صفحه PDF||سفارش دهید||8114 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Automatica, Volume 41, Issue 5, May 2005, Pages 767–778
Practical exploitation of optimal dual control (ODC) theory continues to be hindered by the difficulties involved in numerically solving the associated stochastic dynamic programming (SDPs) problems. In particular, high-dimensional hyper-states coupled with the nesting of optimizations and integrations within these SDP problems render their exact numerical solution computationally prohibitive. This paper presents a new stochastic dynamic programming algorithm that uses a Monte Carlo approach to circumvent the need for numerical integration, thereby dramatically reducing computational requirements. Also, being a generalization of iterative dynamic programming (IDP) to the stochastic domain, the new algorithm exhibits reduced sensitivity to the hyper-state dimension and, consequently, is particularly well suited to solution of ODC problems. A convergence analysis of the new algorithm is provided, and its benefits are illustrated on the problem of ODC of an integrator with unknown gain, originally presented by Åström and Helmersson (Computers and Mathematics with Applications 12A (1986) 653–662).
The optimal dual control (ODC) idea is to predict and make use of future model estimates during the control calculation. Instead of using a fixed model for computation of the entire control policy, in ODC, later controls are computed using models predicted to result from those controls applied earlier. The resulting optimal policy balances system excitation, to improve future model accuracy, against system regulation, to achieve good immediate control. The name “dual control” arises from the need to simultaneously satisfy these two competing objectives. ODC has proven extremely difficult to implement in practice due to several computational issues and as a result most researchers in the dual control field have been discouraged from attempting to solve the ODC problem. Instead, the recent trend has been towards development of sub-optimal approaches to dual control. Although proving simpler than ODC to implement in practice, these approaches have several disadvantages (Lindoff et al., 1999 and Filatov and Unbehauen, 2000). Numerically, ODC involves computation of a state- and time-dependent control policy defined over a finite control horizon to minimize a stochastic cost function subject to the dynamics of a non-linear time-varying hyper-system with continuous hyper-state1 and control spaces. Such problems are extremely difficult to solve because (i) existing numerical approaches encounter the curse of dimensionality, which is amplified in ODC due to the presence of model parameters and their uncertainty descriptions within the hyper-state, (ii) non-standard probability distributions arise from the feedback of uncertainty, preventing the cost function from being written as a closed-form analytical expression and (iii) use of parameterized control policy representations is difficult because the optimal policy is invariably discontinuous over the hyper-state space. The contribution of this paper lies in its identification of mechanisms by which ODC problems can be numerically solved without complete hyper-state discretization or nested numerical integration. We show that the original ODC problem can be made numerically tractable via a combination of iterative dynamic programming (IDP) (Luus, 2000a) and Monte Carlo methods. The result is a significant mitigation of the curse of dimensionality. The use of Monte Carlo methods in the solution of SDP problems is not novel. Algorithms such as value iteration,2 policy iteration, Q-learning and neuro-dynamic programming are well-known dynamic programming approaches that employ Monte Carlo sampling in stochastic settings (Mitchell, 1997, Sutton and Barto, 1998 and Bertsekas and Tsitsiklis, 1996). The SIDP algorithm presented here provides an advantage over these approaches in that it does not employ complete state-space discretization and is therefore less sensitive to the curse of dimensionality. This makes it applicable to higher dimensional SDP problems, e.g. ODC, which have proven unmanageable using these other approaches. A similar effect has been achieved in other work (de Farias and van Roy, 2003 and Hauskrecht and Kveton, 2004), by reformulating SDP problems into linear programming (LP) problems and approximating the large number of resulting constraints using Monte Carlo sampling. SIDP differs from these LP approaches in that it also extends to unbounded, continuous state and action spaces, its solution accuracy is independent of the state-space discretization, and it does not require basis function policy representations in order to reduce dimensional sensitivity. In Section 2 of this paper we describe the ODC problem, present its dynamic programming formulation, and elaborate on the complexities of its numerical solution. Section 3 discusses dynamic programming in the context of ODC and explains how the numerical difficulties introduced by stochasticity can be alleviated by use of Monte Carlo methods. The new SDP solution algorithm, SIDP, is also presented in this section. A convergence analysis of SIDP is provided in Section 4. The problem of ODC of an integrator with unknown gain (Åström & Helmersson, 1986) is adopted as a benchmark in Section 5 to illustrate the advantages of the new algorithm relative to the standard dynamic programming approach.
نتیجه گیری انگلیسی
A new stochastic dynamic programming algorithm for numerical solution of optimal dual control problems has been proposed. The algorithm uses Monte Carlo sampling to avoid numerical integration, and has been shown to significantly reduce the computational requirements involved in obtaining numerical solutions. Additionally, it retains all the major benefits of the iterative dynamic programming algorithm from which it is derived, particularly an insensitivity to problem dimensionality. As such, it is well suited to optimal dual control applications, in which the usual curse of dimensionality is further compounded by the presence of stochasticity. While some theoretical properties of the algorithm remain to be fully analyzed, the numerical results presented demonstrate that the new algorithm offers a computationally attractive alternative to use of the standard dynamic programming approach.