برنامه ریزی پویا توزیع شده برای کنترل تصادفی گسسته زمان و الگوریتم
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|25687||2011||9 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Automatica, Volume 47, Issue 3, March 2011, Pages 443–451
Idempotent methods have been found to be extremely fast for the solution of dynamic programming equations associated with deterministic control problems. The original methods exploited the idempotent (e.g., max-plus) linearity of the associated semigroup operator. It is now known that curse-of-dimensionality-free idempotent methods do not require this linearity. Instead, it is sufficient that certain solution forms are retained through application of the associated semigroup operator. Here, we see that idempotent methods may be used to solve some classes of stochastic control problems. The key is the use of the idempotent distributive property. We demonstrate this approach for a class of nonlinear, discrete-time stochastic control problems.
It is now well known that many classes of deterministic control problems may be solved by max-plus or min-plus (more generally, idempotent) numerical methods. Here, max-plus methods are appropriate for problems with maximizing controllers and vice versa. These methods include max-plus basis-expansion approaches (Akian et al., 2004, Akian et al., 2008, Fleming and McEneaney, 2000, McEneaney, 2004, McEneaney, 2006 and McEneaney and Dower, 2003), as well as the more recently developed curse-of-dimensionality-free methods (McEneaney, 2006 and McEneaney, 2007). However, previously, stochastic control problems have eluded idempotent methods. In a recent application, a min-plus approach was useful for a problem in sensing control (McEneaney, Oran et al., 2008 and Oran and McEneaney, 2010). In that example, the state process was an observation-conditioned probability vector, where the stochasticity of the state process was due to the stochastic observation inputs. This led to the realization that idempotent methods were indeed applicable to stochastic control problems. The key tool that had been missing previously was simply the idempotent distributive property. It was also necessary to realize that idempotent linearity of the associated semigroup was not required for applicability of curse-of-dimensionality-free types of idempotent methods. Instead, it is sufficient that certain solution forms are retained through application of the semigroup operator, i.e., the dynamic programming principle operator. We will see that, under certain conditions, pointwise minima of affine and quadratic forms will pass through this operator, when we have a minimizing control problem. As the operator contains an expectation component, this will require application of the idempotent distributive property. In the case of finite sums and products, this property looks like our standard-algebra distributive property; in the infinitesimal case, it is familiar to control theorists through notions of strategies, non-anticipative mappings, and/or progressively measurable controls. Using this technology, the value function will be propagated backwards with a representation as a pointwise minimum of quadratic or affine forms. This approach will have some similarities to the idempotent curse-of-dimensionality-free methods developed for deterministic control problems (McEneaney, 2006, McEneaney, 2007, McEneaney, Deshpande et al., 2008 and McEneaney, 2009b). However, the need to use the idempotent distributive property makes this approach very different from those. We will use a very limited class of control problems to introduce this new approach. We consider discrete-time, finite time-horizon control problems. The dynamics will contain a possibly continuum-valued stochastic input. They will also contain both a possibly continuum-valued control, and a control taking values in a finite set. The running cost will also depend on these inputs. We may think of the dynamics and the running cost as being indexed by the values of the finite control input. In the actual computational algorithms which will be obtained, each indexed dynamics will be linear, and each indexed running cost will be either quadratic or affine. A similar approach was taken with the curse-of-dimensionality-free methods for deterministic control. In that case, it was shown that any semiconvex (respectively, convex) Hamiltonian could be arbitrarily well approximated as a pointwise maximum of quadratic (respectively, affine) forms, and so this class is not so restrictive. Note that the purpose of this paper is the announcement of this newly discovered class of algorithms. The general theory is indicated in an abstract, albeit discrete-time, setting, resulting in what we refer to here as the idempotent distributed dynamic programming principle (IDDPP). The basic algorithms will be described for two relatively simple classes of problems, where the value function will be given in terms of pointwise minima of quadratic and affine forms. Instantiation of these basic algorithms in computationally efficient forms requires approximation steps. A study of such approximations will not be included here, but the reader may refer to McEneaney, Oran et al. (2008) and Oran and McEneaney (2010) for some initial approximation methods relevant to this class of algorithms. A few words about the expected utility of the approach are appropriate here. The paper considers a discrete-time, finite time-horizon stochastic control problem with continuum-valued state process. The basic theory is developed for continuum-valued control and noise processes. However, at the ends of Sections 5 and 6 (on quadratic and affine forms), versions of the method are given in the case of finite control and disturbance spaces. In fact, these are the cases which are most readily amenable to computation using this approach. The well-known curse of dimensionality, which is a serious impediment to grid-based methods, is due to the discretization over the state space. Here, this is avoided in the base form of the algorithm, but there is a large penalty to be paid in terms of extremely rapid growth of the solution complexity as one back-propagates. Without the pruning techniques for attenuation of this complexity growth, the approach would not be feasible for serious problems. This is similar to the case with the curse-of-dimensionality-free method of McEneaney, 2006 and McEneaney, 2007. We discuss complexity reduction for our new approach in McEneaney (2009a), Oran and McEneaney (2010), and will pursue this further in subsequent efforts. Nonetheless, it seems that a general rule of thumb will be that this approach is particularly useful for high-dimensionality/low-complexity problems. Here, the term high dimensionality refers to the class of problems where the state-space dimension is beyond that to which grid-based methods can feasibly be applied. On the other hand, problems where the Hamiltonian might be well modeled by pointwise minima of a small set of linear-quadratic forms tend to be of low complexity. More specifically, however, in the sense needed here, a problem is of low complexity if the solution is well approximated by a reasonably small number of simple forms such as affine or quadratic forms, where minima of such forms are inherited through the IDDPP operator to appear below. The simple example included for illustrative purposes falls into the class of low-complexity problems. The layout of the paper is as follows. In Section 2, a finite time-horizon, discrete-time stochastic control problem is defined, and the standard dynamic programming principle (DPP) is given in this context. In Section 3, a continuum version of the min-plus distributive property is proved. This is the result that will allow us to obtain the main result. This main result is the IDDPP, which is obtained for this class of problems in Section 4. In Sections 5 and 6, the general IDDPP is reduced to computationally amenable forms for specific problem formulations. In Section 7, a simple example is included.