چارچوب شبکه پویا-بیزی برای مدل سازی و ارزیابی یادگیری از طریق مشاهده
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
29312 | 2014 | 15 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 41, Issue 11, 1 September 2014, Pages 5212–5226
چکیده انگلیسی
Learning from observation (LfO), also known as learning from demonstration, studies how computers can learn to perform complex tasks by observing and thereafter imitating the performance of a human actor. Although there has been a significant amount of research in this area, there is no agreement on a unified terminology or evaluation procedure. In this paper, we present a theoretical framework based on Dynamic-Bayesian Networks (DBNs) for the quantitative modeling and evaluation of LfO tasks. Additionally, we provide evidence showing that: (1) the information captured through the observation of agent behaviors occurs as the realization of a stochastic process (and often not just as a sample of a state-to-action map); (2) learning can be simplified by introducing dynamic Bayesian models with hidden states for which the learning and model evaluation tasks can be reduced to minimization and estimation of some stochastic similarity measures such as crossed entropy.
مقدمه انگلیسی
Learning by watching others do something is a natural and highly effective way for humans to learn. It is also an intuitive and highly promising avenue for machine learning. It provides a way for machines to learn how to perform tasks in a more natural fashion. For many tasks, learning from observation is more natural than providing static examples that explicitly contain the solution, as in the traditional supervised learning approach. It is also easier than manually creating a controller that encodes the desired behavior. Humans typically just perform the task and trust that the observer can figure out how to successfully imitate the behavior. Although there has been a significant amount of research in learning from observation (LfO), there is no agreement on a unified terminology. Works reported in the literature also refer to learning from demonstration, learning by imitation, programming by demonstration, or apprenticeship learning, as largely synonymous to learning from observation. In learning from demonstration, a human purposely demonstrates how to perform a task or an action, expressly to teach a computer agent how to perform the same task or mission. We consider learning from demonstration to be a specialization of LfO and define the latter as a more general learning approach, where the actor being observed need not be a willing participant in the teaching process. Specifically, the problem we are trying to address in this paper is the lack of a unified framework to understand existing work in LfO, as well as the lack of standard evaluation metrics to assess the performance of LfO algorithms (which are typically evaluated using metrics designed for standard supervised learning). To that purpose, we present an unified framework for learning from observation based on Dynamic Bayesian Networks (DBNs) ( Nefian, Liang, Pi, Liu, & Murphy, 2002). We provide both an intuitive description of the framework, as well as a formal statistical model of LfO. The main contributions of this paper, are: • A formal statistical model of LfO, that provides a unified vocabulary and theoretical framework for LfO. • A taxonomy of the different behaviors that can be learned through LfO. • An explicit formulation of the difference between supervised learning and LfO algorithms. This is important because in most LfO work, standard supervised algorithms (like neural networks, or nearest-neighbor classifiers) are used, yet there are many behaviors to be learned through LFO for which those algorithms are not appropriate. • A proposal for standard evaluation metrics for agents trained through LfO (currently lacking from the literature). Our framework makes explicit the reason for which standard metrics, such as classification accuracy, do not properly reflect how well an LfO algorithms can learn complex tasks in some situations. We describe the reasons for this, and propose an alternative evaluation approach based on the Kullback-Liebler divergence. The remainder of this paper is organized as follows. Section 2 briefly summarizes previous research in the field. After that, Section 3 introduces a common framework and vocabulary for learning from observation, including a statistical formalization of the problem. Section 4 focuses on evaluation metrics for LfO algorithms. Finally, Section 5 presents an empirical validation of two of our claims: (a) supervised learning algorithms are not appropriate for some LfO behaviors, and (b) our proposed evaluation metric is more accurate than the typical metrics used in the literature of LfO.
نتیجه گیری انگلیسی
Despite the considerable amount of interest and work on learning from observation, the field lacks a unified framework for comparing different approaches. The main goal of this paper is to put forward a proposal to fill in that gap, and present a unified framework for learning from observation, consisting of three main components: (a) a statistical model based on Dynamic Bayesian Networks (DBNs) that sheds some light on the formal differences between LfO and other forms of learning, (b) a classification of the different levels of difficulty of the tasks attempted by LfO, and (c) a collection of evaluation metrics for different LfO tasks. Finally, we have presented an empirical evaluation of two of the main claims underlying our model. Namely, that supervised learning algorithms are not appropriate for some LfO tasks, and that standard metrics such as classification accuracy cannot always accurately determine whether two behaviors are similar or not when these behaviors are stochastic or require memory of past states. Our experimental evaluation suggests that our evaluation metrics better reflect the performance of LfO algorithms than typical metrics used in the LfO literature: our results show that standard metrics such as classification accuracy cannot accurately determine whether two behaviors are similar or not when these behaviors are stochastic or require memory of past states. The alternative metrics proposed in this paper, based Vapnik’s risk or Vapnik’s rate can better determine behavior similarity in these situations. The proposed DBN framework makes explicit which are the key challenges in LfO: learning algorithms that can handle the hidden internal state of the expert, or that can learn dependencies between the behavior of the expert and not just the current state, but also past states. Moreover, note that the DBN framework presented in this paper is not to be seen as a practical approach to LfO but as a theoretical model to understand LfO and its differences with respect to traditional supervised machine learning. In other words, our model is one step forward in understanding LfO, on making explicit which are the key challenges, on providing a unified methodology for evaluating the performance of LfO algorithms, but does not include practical algorithms to solve the LfO problems, which are beyond the scope of this paper. As part of our future work, we want to explore the creation of new algorithms to LfO that can learn the same range of behaviors as our LfO-DBN model, but that are practical and offer better scalability. Additionally, we will continue our study of performance metrics for LfO, with the goal of reaching agreement in the LfO community over a set of standard metrics. For example, our current proposed metric for behaviors of level 3 requires training an LfO-DBN, which is a computationally expensive process for complex behaviors. As part of our future work, we would like to explore the possibility of defining metrics that do not require training complex DBNs, but that are still able to accurately measure the performance of LfO algorithms when learning tasks of level 3 (memory-based behavior).