ترجمه فارسی عنوان مقاله

روش برنامه ریزی پویا تقریبی برای کنترل فرآیند

عنوان انگلیسی

Approximate dynamic programming approach for process control

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
25674	2010	11 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Journal of Process Control, Volume 20, Issue 9, October 2010, Pages 1038–1048

ترجمه کلمات کلیدی

کنترل فرآیند تصادفی - برنامه ریزی پویا تصادفی - برنامه ریزی پویا تقریبی - کنترل دوگانه - کنترل محدود -

کلمات کلیدی انگلیسی

Stochastic process control, Stochastic dynamic programming, Approximate dynamic programming, Dual control, Constrained control,

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

We assess the potentials of the approximate dynamic programming (ADP) approach for process control, especially as a method to complement the model predictive control (MPC) approach. In the artificial intelligence (AI) and operations research (OR) research communities, ADP has recently seen significant activities as an effective method for solving Markov decision processes (MDPs), which represent a type of multi-stage decision problems under uncertainty. Process control problems are similar to MDPs with the key difference being the continuous state and action spaces as opposed to discrete ones. In addition, unlike in other popular ADP application areas like robotics or games, in process control applications first and foremost concern should be on the safety and economics of the on-going operation rather than on efficient learning. We explore different options within ADP design, such as the pre-decision state vs. post-decision state value function, parametric vs. nonparametric value function approximator, batch-mode vs. continuous-mode learning, and exploration vs. robustness. We argue that ADP possesses great potentials, especially for obtaining effective control policies for stochastic constrained nonlinear or linear systems and continually improving them towards optimality.

مقدمه انگلیسی

Model predictive control (MPC) is a technique in which the current control action is obtained by minimizing on-line, a cost criterion defined on a finite time interval. Nominal deterministic trajectories of future disturbance signals and uncertainties are necessarily assumed in order to obtain an optimization problem amenable to on-line solution via math programming. The solution generates a control sequence from which the first element is extracted and implemented. The procedure is repeated at the next time instant. Owing to its ability to handle constrained, multi-variable control problems in an optimal manner, MPC has become the de-facto advanced process control solution for the process industries today. MPC is by now considered to be a mature technology owing to the plethora of research and industrial experiences during the past three decades. Despite this, it has some fundamental limitations, which prevents it from being a panacea for all process control problems. One well-known limitation is the potentially exorbitant on-line computation required for solving a large-scale, and potentially non-convex math program that scales with the dimension of the state as well as the length of prediction horizon. Recent developments [1] have made some headway in tackling this problem although nontrivial computational challenges still exist. The second limitation arises from the fact that the deterministic formulation adopted by MPC is inherently limited in addressing uncertainty in a closed-loop optimal fashion. Its open-loop optimal control formulation used to find the control moves at each sample time means the fact that information about future uncertainty will be revealed, this being generally beneficial for control performance, is not considered. Most of the past attempts at ameliorating the impact of uncertainty has been reflected in robust MPCs formulations based on the objective of minimizing the worst-case scenarios [2] at the expense of overly conservative policies. Multi-scenario formulations [1] have also been developed but the number of scenarios is limited and they do not give closed-loop optimal policies in general. Stochastic programming-based methodologies [3] allow for recourse actions at the computational expense of enumerating an exponentially growing number of scenarios. Chance constrained optimization formulation has also been studied extensively by a number of authors [4] and [5]. In this paper, we examine the possibility of lessening or removing the above-mentioned limitations by combining MPC with an approach called “approximate dynamic programming (ADP).” ADP is a technique that surfaced from the research on reinforcement learning in the artificial intelligence (AI) community [6] and [7]. It has its theoretical foundations in the traditional dynamic programming by Richard Bellman [8] but its computational bottlenecks, termed as “the curse of dimensionality” by Bellman himself, are relieved through ideas such as intelligent sampling of the state space through simulations and function approximation. ADP, due to its root in AI, has mainly been studied in the context of Markov decision processes (MDPs), which involve discrete finite state/action spaces and probabilistic transitions. Hence, its application to process control problems, which typically involve continuous state/action spaces, is not straightforward. In addition, the characteristics of process control problems are somewhat different from those of robotics, games, and resource allocation problems. For example, in process control applications, the idea of “learning by mistakes” for the sake of efficient learning, may not be tolerated as mistakes often bring unacceptable consequences in terms of safety and economics. Hence, extension of ADP to process control may require significant care and possibly some new tools. Design of an ADP algorithm involves a variety of choices, including type of function approximator, pre-decision vs. post-decision formulation, batch vs. continuous updating of the value table, and exploration vs. robustness trade-off. We will visit these issues, carefully examining the implications of these choices in the context of designing a learning algorithm for process control applications. In addition, we will also consider the complementary nature or synergies between ADP and MPC. It is to be noted that most previous published works on ADP, as specialized for process control problems, are based on the pre-decision state formulation and are better suited for deterministic problems. For stochastic problems, such as those treated in the examples of this paper, the post-decision state formulation (see Section 3.3) confers immediate practical benefits since it allows the efficient use of off-the-shelf optimization solvers found in all MPC technology. The rest of the paper is organized as follows. In Section 2, we will briefly review the basics of MDP, ADP and also present a mathematical representation of the system we consider for control. In Section 3, we will examine the various options and choices and their implications for process control applications. In Section 4, we will present a few examples, including those involving both linear and nonlinear stochastic systems. In Section 5, we conclude the paper and discuss other control-related areas where ADP can potentially be useful in the process industries.

نتیجه گیری انگلیسی

We have examined the potentials of ADP for process control and found that it can replace/complement MPC to reduce the on-line computational load and also address stochastic system uncertainties in a computationally amenable manner. ADP comes with a number of design options and one must think carefully through them to choose the right options for a given application. We have argued that, for process control problems, post-decision-state formulation offers the ability to use deterministic math programming solvers to be utilized, both off-line and on-line, and therefore may be more convenient than the more conventional pre-decision-state formulation. In addition, the use of function approximators with non-expansion properties offer stable learning. Robustness against over-extrapolation can be achieved through the use of a penalty function. Finally, to achieve performance close to optimal ones, we recommend alternation between the value function update and simulation (or on-line implementation) to increase the sample set as the learning proceeds. It is clear that ADP holds exciting opportunities for process control. Though not discussed in this paper, there are a number of other application areas within process industries where ADP can prove to be a valuable tool, including resource allocation and inventory management [33], [42], [44] and [46], design and planning under uncertainty [43], scheduling of multiple controllers [38], and equipment/product inspection [45]. Raised awareness of the ADP technique within the process systems engineering research community will undoubtedly bring forth additional applications that can benefit from it.