یادگیری استفاده از حافظه اپیزودیک
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|33655||2011||10 صفحه PDF||سفارش دهید||7062 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Cognitive Systems Research, Volume 12, Issue 2, June 2011, Pages 144–153
This paper brings together work in modeling episodic memory and reinforcement learning (RL). We demonstrate that is possible to learn to use episodic memory retrievals while simultaneously learning to act in an external environment. In a series of three experiments, we investigate using RL to learn what to retrieve from episodic memory and when to retrieve it, how to use temporal episodic memory retrievals, and how to build cues that are the conjunctions of multiple features. In these experiments, our empirical results demonstrate that it is computationally feasible to learn to use episodic memory; furthermore, learning to use internal episodic memory accomplishes tasks that reinforcement learning alone cannot. These experiments also expose some important interactions that arise between reinforcement learning and episodic memory. In a fourth experiment, we demonstrate that an agent endowed with a simple bit memory cannot learn to use it effectively. This indicates that mechanistic characteristics of episodic memory may be essential to learning to use it, and that these characteristics are not shared by simpler memory mechanisms.
In this paper, we study a mechanism for learning to use the retrieval of knowledge from episodic memory. This unifies two important related areas of research in cognitive modeling. First, it extends prior work on the use of declarative memories in cognitive architecture where knowledge is accessed from declarative memories via deliberate and fixed cued retrievals (Anderson, 2007, Nuxoll and Laird, 2007 and Wang and Laird, 2006) by exploring mechanisms for learning to use both simple and conjunctive cues. Second, it extends work on using reinforcement learning (RL) (Sutton & Barto, 1998) to learn not just control knowledge for external actions, but also to learn to control access to internal memories, expanding the range of behaviors that can learned by RL. Earlier work has investigated increasing the space of problems applicable to RL algorithms by including internal memory mechanisms that can be deliberately controlled: Littman (1994) and Peshkin, Meulaeu, and Kaelbling (1999) developed RL agents that learned to toggle internal memory bits; Pearson, Gorski, Lewis, and Laird (2007) showed that an RL agent could learn to use a simple symbolic long-term memory; and Zilli and Hasselmo (2008) developed a system that learned to use both an internal short-term memory and an internal spatial episodic memory, which could store and retrieve symbols corresponding to locations in the environment. All four cases demonstrated a functional advantage from learning to use memory. Our work significantly extends these previous studies in three ways: first, our episodic memory system automatically captures all aspects of experience; second, our system learns not only when to access episodic memory, but also learns to construct conjunctive cues and when to use them; and third, it takes advantage of the temporal structure of episodic memory by learning to advance through episodic memory when it is useful (this property is also shared by the Zilli & Hasselmo system, but for simpler task and episodic memory representations). Our studies are pursued within a specific cognitive architecture, namely Soar (Laird, 2008), which incorporates all of the required components: perceptual and motor systems for interacting with external environments, an internal short-term memory, a long-term episodic memory, an RL mechanism, and a decision procedure that selects both internal and external actions. In comparison, ACT-R (Anderson, 2007) has many similar components but does not have an explicit episodic memory. Its long-term declarative memory stores only individual chunks, and it does not store episodes that include the complete current state of the system. To do so would require storing the contents of all ACT-R’s buffers as a unitary structure, as well as the ability to retrieve and access them, without having the retrieved values being confused with the current values of those buffers. Moreover, ACT-R’s declarative memory does not inherently encode the temporal structure of episodic memory, where temporally consecutive memories can be recalled (Tulving, 1983). While the work presented in this paper is specific to learning to use an episodic memory, similar work could be pursued in the context of ACT-R by learning to use its declarative memory mechanism. However, we are unaware of existing work in that area, and even if there were, it would fail to engage the same issues that arise with episodic memory.
نتیجه گیری انگلیسی
More broadly, this research opens up the possibility of extending the range of tasks and behaviors modeled by cognitive architectures. To date, scant attention has been paid to many of the more complex properties and richness of episodic memory, such as its temporal structure or the fact that it does not capture just isolated structures and buffers but instead captures working memory as a whole. Similarly, although RL has made significant contributions to cognitive modeling, it has been predominantly used for learning to control only external actions. This research demonstrates that cognitive architectures can use RL to learn more complex behavior that is dependent not just on the current state of the environment, but also on the agent’s prior experience, learning behavior that is possible only when both RL and episodic memory are combined. Although our research demonstrates that it is possible to learn to use episodic memory, it also raises some important issues. Learning is relatively fast when the possible cues lead to the retrieval of an episode that contains all of the knowledge that an agent requires in order to determine how to act in the world. When retrieving episodes that most closely match the current state and then using temporal control of memory to remember what happened next, however, learning is slower and does not always converge to the best possible behavior. Learning to use episodic memory to project forward is difficult – requiring many trials to converge and without a guarantee that optimal behavior will be achieved. Do these same issues arise in humans or do they have other mechanisms that avoid these issues? One obvious approach to avoid the issues encountered in our experiment is to use one method, such as instruction or imitation, to initially direct behavior so that correct behavior is experienced and captured by episodic memory, and then learning to use those experiences would probably be much faster.