تجزیه و تحلیل تجربی از یک سیستم تطبیقی آنلاین با استفاده از ترکیب شبکه های بیزی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|29025||2010||19 صفحه PDF||سفارش دهید||13266 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Information Sciences, Volume 180, Issue 15, 1 August 2010, Pages 2856–2874
An on-line reinforcement learning system that adapts to environmental changes using a mixture of Bayesian networks is described. Building intelligent systems able to adapt to dynamic environments is important for deploying real-world applications. Machine learning approaches, such as those using reinforcement learning methods and stochastic models, have been used to acquire behavior appropriate to environments characterized by uncertainty. However, efficient hybrid architectures based on these approaches have not yet been developed. The results of several experiments demonstrated that an agent using the proposed system can flexibly adapt to various kinds of environmental changes.
Intelligent systems through which robots and agents can effectively learn new behaviors have been actively researched and developed in various fields. One recent research effort in particular focused on building an intelligent system that can adapt to uncertain and dynamically changing environments  and . Reinforcement learning (RL) is a kind of machine learning that can be applied in such environments . Many RL methods have helped improve learning performance in environments modeled by Markov decision processes (MDPs), and some of these methods have also been applied to problems in non-MDP domains , , , , , , , , , , , ,  and . A typical example of a non-MDP model is the partially observable MDP (POMDP) model . In environments modeled as POMDPs, learners do not have complete information regarding their current state due to the presence of noisy input and the detection limit of the learner’s sensor. Stochastic systems can be used to acquire reasonable behavior in environments characterized by uncertainty. A Bayesian network (BN), a kind of stochastic model, can provide appropriate noise-robust output through probabilistic inference , , , , , , ,  and . BNs are capable of representing models in various problem domains and are applicable to a broad range of problems: automatic driving control , behavior control for robots  and , and so on , ,  and . In real-world environments, both the display of appropriate behavior with respect to unobservable input and adaptability to changes in the environment are required. In this paper, we consider situations in which the environment is discretely changed at fixed time intervals. In the field of RL, several improvements have been devised concerning such situations, such as (i) modifying and reusing policies related to previous environments ,  and  and (ii) taking advantage of a single policy chosen from a policy library . Nevertheless, there has been little research thus far on applying stochastic model-based systems to adaptation to dynamic environments. Let us consider how people adapt to changes in the environment. People generally store “knowledge”, i.e., experience, about a number of already solved problems and can solve a problem with high probability if the same problem has previously been encountered and solved. In addition, people can generally solve problems slightly different from previously encountered ones by utilizing their experiences to suitably modify previous solutions. Even completely new problems can be treated more effectively by mixing reusable experiences concerning previously encountered problems. Incorporating this ability into adaptive systems so that they can adapt to a variety of environments is the objective of this study. Inspired by the human ability to act on the basis of empirical knowledge, we focused on how to treat and leverage these experiences for application to agents’ (robots’) behavior learning, and we developed an on-line system capable of adapting to environmental changes by using a mixture of BNs . In the proposed system, called the IPMBN (on-line adaptive system for Improving agents’ Policies using a Mixture of BNs), the learning object is regarded as an RL agent, while the object’s knowledge regarding a certain behavior in an environment, called a policy, is represented by a mixture distribution of BNs. The empirical data from the RL agent is presented as sequences of the agent’s states, actions, and rewards. The BNs store the data statistically, and the experiences are presented as a probability distribution. Each BN in the mixture, therefore, represents the stochastic characteristics of an individual agent’s policy in the respective environment. The mixtures can provide policy information not only about previously encountered environments or similar ones but also about environments without any corresponding BNs in the mixture (called “unencountered environments”). From the theoretical viewpoint, a distinctive mixture formulation (called the “exponential mixture”) is introduced to the IPMBN. A standard mixture formulation (called the linear mixture) is also introduced for comparison. In the proposed system, the assumptions the agent makes about its current environment are represented by a mixture of BNs. When environmental changes are observed, the system modifies the mixture to adapt to the changed environment. It then improves the agent’s policy using the information represented in the mixture. Knowledge about the agent’s behavior in the environment may be modified and partially utilized or may occasionally be used as a negative example, depending on the current environment. This corresponds to how people adapt to changes in the environment. In real-world applications based on the mixing of reusable experiences, it is difficult to endow agents (or robots) with an ability to adapt to environmental changes. Therefore, the use of such mixing in real-world experiments has not been conclusively demonstrated. Actual applications using the proposed system in the real-world (for example, learning effective behavior in robot soccer, or adaptive and intelligent control of objects such as automobiles) have also not yet been realized. Nevertheless, this paper presents experiments using a mobile robot and computer simulations as trial cases. The results show that policy-improvement using a mixture of BNs enables an RL agent to adapt more flexibly to various kinds of environmental changes by modifying, reusing, and combining its knowledge. Moreover, the IPMBN displays acceptable performance even in the following problematic cases: (i) where it is difficult to reuse the policies learned for previous environments for the current one; (ii) where the current environment differs significantly from the ones the agent previously encountered. The paper is organized as follows. Section 2 briefly explains an RL method known as profit sharing as well as the BN concept. Section 3 describes the process of representing the agent’s policy information using BNs in the IPMBN, the concept of a BN mixture, and the algorithm governing the adaptation of the IPMBN to dynamic environments. Section 4 describes computer simulations conducted to evaluate the system’s characteristics and performance, presents the results, and discusses their implications. It also describes experiments using a mobile robot, presents those results, and discusses them. Section 5 discusses related work. Finally, the key points are summarized and future research is mentioned in Section 6.
نتیجه گیری انگلیسی
Our on-line adaptive system for improving the policies of an agent using a mixture of BNs (IPMBN) uses Bayesian networks (BNs) to represent information about the policy an agent (or robot) should follow for a particular environment. The use of a linear or exponential mixture of BNs makes it possible to describe a much broader variety of policies. Computer simulation showed that the IPMBN performs better than the original profit sharing method and the improved version thereof in terms of the number of successful trials and the number of actions taken. Comparison of the results for the two types of mixtures revealed that the system using an exponential mixture performed better than the one using a linear mixture. Moreover, the results of experiments conducted using real-world environments demonstrated that the Khepera II robot using the IPMBN with an exponential BN mixture behaved more efficiently than when using the original method and the improved version regardless of the presence of input or output noise. The IPMBN provides an agent with the ability to adapt flexibly to both virtual and real-world environments thanks to appropriate tuning of the mixing parameters for the current environment. The mixing parameters for an environment can be viewed as a measure reflecting how well the policy information represented by a BN in the mixture fits the characteristics of the environment. Exponential mixtures are capable of utilizing reverse policy characteristics by assigning negative values to the mixing parameters of the BN. Since the use of reverse policy characteristics contributes to the representation of a wider variety of policies, the IPMBN performed better in the experiments when an exponential mixture was used. We carried out the computations required to tune the mixing parameters for both kinds of mixture and set BB, used in Eq. (14) for transforming β i to λ i, to 1.0 to simplify the comparison between policy features. However, for the exponential mixture, the value of BB can be set in an arbitrary manner on condition that the values of the mixture distribution do not overflow. Thus, there is a need to evaluate the IPMBN’s performance after easing the constraints. Agents that base their behavior on the IPMBN can navigate many kinds of environments, including unencountered ones. If an agent encounters an environment for which the current mixture does not have appropriate components, a BN corresponding to the new environment must be incorporated into the mixture. The computational complexity of tuning the mixing parameters and the subsequent reconfiguration of the model increases sharply with the number of BNs in the mixture. Therefore, further research should include considering how to optimize the number of BNs while maintaining adaptability to environmental change.