دانلود مقاله ISI انگلیسی شماره 79779
عنوان فارسی مقاله

برنامه نویسی پویای زمان واقعی برای فرآیندهای تصمیم مارکف با احتمال مبهم

کد مقاله سال انتشار مقاله انگلیسی ترجمه فارسی تعداد کلمات
79779 2016 32 صفحه PDF سفارش دهید محاسبه نشده
خرید مقاله
پس از پرداخت، فوراً می توانید مقاله را دانلود فرمایید.
عنوان انگلیسی
Real-time dynamic programming for Markov decision processes with imprecise probabilities
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Artificial Intelligence, Volume 230, January 2016, Pages 192–223

کلمات کلیدی
برنامه ریزی احتمالی؛ فرایند تصمیم گیری مارکوف؛ برنامه ریزی قوی
پیش نمایش مقاله
پیش نمایش مقاله برنامه نویسی پویای زمان واقعی برای فرآیندهای تصمیم مارکف با احتمال مبهم

چکیده انگلیسی

Markov Decision Processes have become the standard model for probabilistic planning. However, when applied to many practical problems, the estimates of transition probabilities are inaccurate. This may be due to conflicting elicitations from experts or insufficient state transition information. The Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) was introduced to obtain a robust policy where there is uncertainty in the transition. Although it has been proposed a symbolic dynamic programming algorithm for MDP-IPs (called SPUDD-IP) that can solve problems up to 22 state variables, in practice, solving MDP-IP problems is time-consuming. In this paper we propose efficient algorithms for a more general class of MDP-IPs, called Stochastic Shortest Path MDP-IPs (SSP MDP-IPs) that use initial state information to solve complex problems by focusing on reachable states. The (L)RTDP-IP algorithm, a (Labeled) Real Time Dynamic Programming algorithm for SSP MDP-IPs, is proposed together with three different methods for sampling the next state. It is shown here that the convergence of (L)RTDP-IP can be obtained by using any of these three methods, although the Bellman backups for this class of problems prescribe a minimax optimization. As far as we are aware, this is the first asynchronous algorithm for SSP MDP-IPs given in terms of a general set of probability constraints that requires non-linear optimization over imprecise probabilities in the Bellman backup. Our results show up to three orders of magnitude speedup for (L)RTDP-IP when compared with the SPUDD-IP algorithm.

خرید مقاله
پس از پرداخت، فوراً می توانید مقاله را دانلود فرمایید.