برنامه ریزی پویا مقاوم تطبیقی برای سیستم های خطی و غیر خطی: مرور کلی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|25938||2013||9 صفحه PDF||سفارش دهید||6500 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : European Journal of Control , Volume 19, Issue 5, September 2013, Pages 417–425
The field of adaptive dynamic programming with diverse applications in control engineering has undergone rapid progress over the past few years. A new theory called “Robust Adaptive Dynamic Programming” (for short, RADP) is developed for the design of robust optimal controllers for linear and nonlinear systems subject to both parametric and dynamic uncertainties. A central objective of this paper is to give a brief overview of our recent contributions to the development of the theory of RADP and to outline its potential applications in engineering and biology.
Approximate/adaptive dynamic programming (for short, ADP) is a biologically-inspired, non-model-based, computational method that has been used to compute optimal control laws; see, e.g., , , ,  and  and numerous references therein. It is well-known that conventional dynamic programming  requires the perfect knowledge of system dynamics and suffers from the curse of dimensionality. To avoid these difficulties, Werbos first pointed out in  that adaptive approximation to the Hamilton–Jacobi–Bellman (HJB) equation  can be achieved by designing appropriate reinforcement learning systems (see,  for an excellent introduction to the theory of reinforcement learning). In his seminal work ,  and , Werbos further proposed two basic approaches for implementing ADP: heuristic dynamic programming (HDP) and dual dynamic programming. They can be used to approximate the optimal cost function or its gradient, and their generalized versions can be found in  in which the approximation of the optimal control policy is considered. Similar problems were also studied by Bertsekas and Tsitsiklis  under the name of neuro-dynamic programming and were restricted exclusively to discrete-time systems. A rigorous development of the mathematical principles behind neuro-dynamic programming is provided, along with numerous methods and applications. The development of ADP theory consists of three phases. In the first phase, ADP was extensively investigated within the communities of computer science and operations research. Two basic algorithms, policy iteration  and value iteration , are usually employed. In , Sutton introduced the temporal difference method. In 1989, Watkins proposed the well-known Q-learning method in his PhD thesis . Q-learning shares similar features with the action-dependent HDP scheme proposed by Werbos in . Other related research work under a discrete time and discrete state-space Markov decision process framework can be found in , , , , , , ,  and  and reference therein. In the second phase, stability is brought into the context of ADP while real-time control problems are studied for dynamic systems. To the best of the authors' knowledge, Lewis is the first who has contributed to the integration of stability theory and ADP theory . An essential advantage of ADP theory is that an optimal control policy can be obtained via a recursive numerical algorithm using online information without solving the HJB equation (for nonlinear systems) and the algebraic Riccati equation (ARE) (for linear systems), even when the system dynamics are not precisely known. Optimal feedback control designs for linear and nonlinear dynamic systems have been proposed by several researchers over the past few years; see, e.g., , , , , , , , ,  and . While most of the previous work on ADP theory was devoted to discrete-time systems (see  and references therein), there has been relatively less research for the continuous-time counterpart. This is mainly because ADP is considerably more difficult for continuous-time systems than for discrete-time systems. Indeed, many results developed for discrete-time systems  cannot be extended straightforwardly to continuous-time systems. Nonetheless, early attempts were made to apply Q-learning for continuous-time systems via discretization technique  and . However, the convergence and stability analysis of these schemes are challenging. In , Murray et al. proposed an implementation method which requires the measurements of the derivatives of the state variables. As said previously, Lewis and his co-worker proposed the first solution to stability analysis and convergence proofs for ADP-based control systems by means of LQR theory . A synchronous policy iteration scheme was also presented in . For continuous-time linear systems, the partial knowledge of the system dynamics (i.e., the input matrix) must be precisely known. This restriction has been completely removed in . A nonlinear variant of this method can be found in . The third phase in the development of ADP theory is related to extensions of previous ADP results to nonlinear uncertain systems. Neural networks and game theory are utilized to address the presence of uncertainty and nonlinearity in control systems. See, e.g., , ,  and . An implicit assumption in these papers is that the system order is known and that the uncertainty is static, not dynamic. The presence of dynamic uncertainty has not been systematically addressed in the literature of ADP. By dynamic uncertainty, we refer to the mismatch between the nominal model and the real plant when the order of the nominal model is lower than the order of the real system. A closely related topic of research is how to account for the effect of unseen variables . It is quite common that the full-state information is often missing in many engineering and biological applications and only the output measurement or partial-state measurements are available. Adaptation of the existing ADP theory to this practical scenario is important yet non-trivial. Neural networks are sought for addressing the state estimation problem  and . However, the stability analysis of the estimator/controller augmented system is by no means easy, because the total system is highly interconnected. The configuration of a standard ADP-based control system is shown in Fig. 1. Full-size image (25 K) Fig. 1. Configuration of an ADP-based control system. Figure options Our recent work , , ,  and  on the development of robust variants of ADP theory is exactly targeted at addressing these challenges. 1.2. What is RADP? RADP is developed to address the presence of dynamic uncertainty in linear and nonlinear dynamical systems. See Fig. 2 for an illustration. There are several reasons for which we pursue a new framework for RADP. First and foremost, it is well-known that building an exact mathematical model for physical systems often is a hard task. Also, even if the exact mathematical model can be obtained for some particular engineering and biological applications, simplified models are often more preferable for system analysis and control synthesis than the original complex system model. While we refer the mismatch between the simplified model and the original system to as dynamic uncertainty here, the engineering literature often uses the term of unmodeled dynamics instead. Secondly, the observation errors may often be captured by dynamic uncertainty. From the literature of modern nonlinear control ,  and , it is known that the presence of dynamic uncertainty makes the feedback control problem extremely challenging in the context of nonlinear systems. In order to broaden the application scope of ADP theory in the presence of dynamic uncertainty, our strategy is to integrate tools from nonlinear control theory, such as Lyapunov designs ,  and , input-to-state stability theory , and nonlinear small-gain techniques  and . This way RADP becomes applicable to wide classes of uncertain dynamic systems with incomplete state information and unknown system order/dynamics. Full-size image (16 K) Fig. 2. RADP with dynamic uncertainty. Figure options Additionally, RADP can be applied to large-scale dynamic systems as shown in our recent paper . By integrating a simple version of the cyclic-small-gain theorem , asymptotic stability can be achieved by assigning appropriate weighting matrices for each subsystem. Further, certain suboptimality property can be obtained. Because of several emerging applications of practical importance such as smart electric grid, intelligent transportation systems and groups of mobile autonomous agents, this topic deserves further investigations from a RADP point of view. The existence of unknown parameters and/or dynamic uncertainties, and the limited information of state variables, give rise to challenges for the decentralized or distributed controller design of large-scale systems.