دانلود مقاله ISI انگلیسی شماره 111824
ترجمه فارسی عنوان مقاله

تجربه جدید تجربه تاریخ بازی برای برنامه نویسی پویا تطبیقی ​​رایگان است

عنوان انگلیسی
A new history experience replay design for model-free adaptive dynamic programming
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
111824 2017 30 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Neurocomputing, Volume 266, 29 November 2017, Pages 141-149

پیش نمایش مقاله
پیش نمایش مقاله  تجربه جدید تجربه تاریخ بازی برای برنامه نویسی پویا تطبیقی ​​رایگان است

چکیده انگلیسی

An adaptive dynamic programming (ADP) controller is a powerful control technique that has been investigated, designed and tested in a wide range of applications for solving optimal control problems in complex systems. The performance of the ADP controller is usually obtained by long training periods because the data usage efficiency is low as it discards the samples once used. History experience, also known as experience replay, is a powerful technique showing potential to accelerate the training process of learning and control. However, the existing design of history experience cannot be directly used for the model-free ADP design, because the existing work focuses on the forward temporal difference (TD) information (e.g., state-action pair). This information is between the current time step and the future time step and will need a model network for future information prediction. This paper proposes a new history experience replay design to avoid the usage of the model network or identifier of the system/environment. Specifically, we designed the experience tuple with one step backward state-action information and the TD can be achieved by a previous time step and a current time step. In addition, a systematic approach is proposed to integrate history experience in both the critic and action networks of the ADP controller design. The proposed approach is tested for two case studies: a cart-pole balancing task and a triple-link pendulum balancing task. For fair comparison, we set the same initial starting states and initial weight parameters for both approaches under the same simulation environment. The statistical results show that the proposed approach can improve the required average number of trials to succeed as well as the success rate. In general, the proposed approach improved the required average trial to succeed by 26.5% for cart-pole and 43% for triple-link balancing tasks.