آموزش سخت تعادل نش از طریق تقویت
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|79520||2014||8 صفحه PDF||سفارش دهید||7923 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Journal of Mathematical Economics, Volume 50, January 2014, Pages 148–155
The main results of the paper show that, if the solution trajectories of the underlying replicator equation converge exponentially fast, then, with probability arbitrarily close to one, all the pathwise realizations of the reinforcement learning process will, from some time on, lie within an εε band of that solution. The paper improves upon results currently available in the literature by showing that a reinforcement learning process that has been running for some time and is found sufficiently close to a strict Nash equilibrium, will reach it with probability one.