تجربه جدید تجربه تاریخ بازی برای برنامه نویسی پویا تطبیقی رایگان است
|کد مقاله||سال انتشار||تعداد صفحات مقاله انگلیسی||ترجمه فارسی|
|111824||2017||30 صفحه PDF||سفارش دهید|
نسخه انگلیسی مقاله همین الان قابل دانلود است.
هزینه ترجمه مقاله بر اساس تعداد کلمات مقاله انگلیسی محاسبه می شود.
این مقاله تقریباً شامل 8293 کلمه می باشد.
هزینه ترجمه مقاله توسط مترجمان با تجربه، طبق جدول زیر محاسبه می شود:
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Neurocomputing, Volume 266, 29 November 2017, Pages 141-149
An adaptive dynamic programming (ADP) controller is a powerful control technique that has been investigated, designed and tested in a wide range of applications for solving optimal control problems in complex systems. The performance of the ADP controller is usually obtained by long training periods because the data usage efficiency is low as it discards the samples once used. History experience, also known as experience replay, is a powerful technique showing potential to accelerate the training process of learning and control. However, the existing design of history experience cannot be directly used for the model-free ADP design, because the existing work focuses on the forward temporal difference (TD) information (e.g., state-action pair). This information is between the current time step and the future time step and will need a model network for future information prediction. This paper proposes a new history experience replay design to avoid the usage of the model network or identifier of the system/environment. Specifically, we designed the experience tuple with one step backward state-action information and the TD can be achieved by a previous time step and a current time step. In addition, a systematic approach is proposed to integrate history experience in both the critic and action networks of the ADP controller design. The proposed approach is tested for two case studies: a cart-pole balancing task and a triple-link pendulum balancing task. For fair comparison, we set the same initial starting states and initial weight parameters for both approaches under the same simulation environment. The statistical results show that the proposed approach can improve the required average number of trials to succeed as well as the success rate. In general, the proposed approach improved the required average trial to succeed by 26.5% for cart-pole and 43% for triple-link balancing tasks.