Motivated Agents--688IT编程网

Motivated Agents

Kathryn Kasmarik 1, William Uther, Mary-Lou Maher 1

National ICT Australia ∗, 1University of Sydney

kkas0686@it.usyd.edu.au, william.uther@nicta.au, mary@arch.it.usyd.edu.au

1 Introduction

This poster presents a model for motivated agents with be-haviours influenced by an internal, domain independent motivation process rather than domain specific rewards, goals, or examples provided by an external teacher. Internal motivation is desirable in complex, dynamic environments where hard-coded domain theories can only approximate the true state of the world and where pre-programmed goals can become obsolete.

Early work with motivated agents focused on the use of domain specific motives [Sloman and Croucher, 1981] or environment modelling by focusing attention on situations with the highest potential for learning [Kaplan and Oudeyer, 2004]. More recent work with intrinsically motivated rein-forcement learning agents [Singh et al., 2005] has produced agents that, rather than learning a model, learn new behav-ioural options [Precup et al., 1998]. An option or simply a behaviour is a whole course of action that achieves some sub-goal. Our model extends previous work by defining general structures for events, attention focus and motivation.

2 The Motivated Agent Model

Our model for motivated agents is shown in Figure 1. It has three types of structures: sensors, memory and effectors. These structures are connected by four processes: sensation, motivation, learning and action. Agents functions in a con-tinuous sensation-motivation-learning-action loop.

Sensors receive raw data in the form of an n-tuple of state variables {x 1, x 2, x 3, ... x n } describing the current state of the agent and its environment. The n-tuple has a particular for-mat: x 1 to x k comprise data about the state of the agent and x k+1 to x n comprise data about other objects sensed by the agent. This includes both absolute and relative data such as the location of an object relative to the agent. This general format is domain independent. Relative values make allow agents to learn general behaviours relative to themselves. Sensation transforms raw data received from sensors into structures called events. Events encapsulate two recognis-

∗

National ICT Australia is funded by the Australian Govern-ment’s Backing Australia’s Ability initiative, in part through the Australian Research Council.

able occurrences among state variables values: increases and decreases between one state and the next. Individual events and the current state n-tuple are incorporated into memory.

Memory is a cumulative record of events, actions, and goals. Initially, memory is empty save for a set of primitive actions that the agent can perform in the world.

Motivation creates goals and stimulates action towards goals. Agents are motivated to create goals to understand and repeat interesting events that occur in their environment. We characterise interesting using the intuition that events that occur infrequently in the world are interesting. Agents act towards goals by identifying situations in which they can learn how to make interesting events recur using a temporal difference Q-learner (TDQL) [Sutton and Barto, 2000]. Goals are masked so that events that occur with equal or lesser frequency than the one being pursued are ignored. Masking increases learning efficiency by focusing attention and reducing the size of the state space.characterise

Learning encapsulates new knowledge as a behaviour once an agent can repeat an interesting event at will. Mask-

ing ensures that behaviours are independent of the situation in which they were learned. A behaviour can be reused ei-ther as a pre-planned course of action for achieving new goals similar to the one from which it was originally created or as a building block when learning to solve more complex goals.

Action reasons about which effectors will further the agent’s progress towards its current goal.

Effectors are the means by which actions are achieved. They allow the agent to cause a direct change to the world.

3 Motivated Agents in Practice

We demonstrate our agent model using a robot guide. This domain, based on Dietterich’s taxi domain [2000] and illus-trated in Figure 2, contains avatars who need to be guided to particular locations. Such situations are common in large scale virtual worlds where new citizens can easily become lost. There are four possible sources and destinations for avatars. The robot has twelve effectors controlling the movements of its legs and can choose to start or stop guid-ing an avatar. The robot’s sensors can perceive the absolute co-ordinates of the agent, the elevations of its legs, the co-ordinates of its legs relative to its body and the absolute and relative co-ordinates of an avatar and its destination.

Figure 2 – A guide robot.

After a period of exploration, the robot notices that in-creases and decreases in its location and the location of other avatars are infrequent. The agent is motivated to cre-ate and pursue goals to repeat these changes. Over time the agent learns walking behaviours, such as Behaviour-1, to change its location. These behaviours are independent of the location in which they were learned and the position of other avatars. The agent uses these walking behaviours to develop path-following behaviours, such as Behaviour-2, to change the location of the avatar.

Behaviour-1 [lift left foot, move left foot forwards, put-down left foot, lift right foot, move right foot forwards, put-down right foot]

Behaviour-2 [guide, Behaviour-1, Behaviour-1, Behaviour-1, Behaviour-1, Behaviour-1, stop-guiding]

4 Empirical Results

We measured the performance of the robot guide by graph-ing the number of primitive actions taken to produce Behav-iour-2 in response to some goal G. We then implemented a flat TDQL with a pre-programmed reward for achieving a similar behaviour that satisfies G. Figure 3 shows that the motivat

ed agent learns Behaviour-2 more quickly than a flat TDQL can learn a similar behaviour. This is because the motivated agent can make use of abstract behaviours such as Behaviour-1while constructing Behaviour-2.

Figure 3 – Learning progress in a guide task.

5 Conclusion

Motivated agents are autonomous agents with behaviours influenced by internal, domain independent motivation. In addition to being able to choose their own goals they are able to learn behaviours to satisfy their goals more quickly than agents utilising the TDQL algorithm

References

[Dietterich, 2000] T. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition.

Journal of Artificial Intelligence Research, 13:227-303, 2000.

[Kaplan and Oudeyer, 2004] F. Kaplan and P-Y. Oudeyer.

Intelligent adaptive curiosity: a source of self-development. In Proceedings of the 4th International Workshop on Epigenetic Robotics, pages 127–130, 2004. [Precup et al., 1998] D. Precup, R. Sutton, and S. Singh.

Theoretical results on reinforcement learning with tem-porally abstract options. In Proceedings of the 10th Eu-ropean Conference on Machine Learning, 1998. [Singh et al., 2005] S. Singh, A. G. Barto, and N. Chenta-nex. Intrinsically motivated reinforcement learning. To appear in Proceedings of Advances in Neural Informa-tion Processing Systems 17 (NIPS), 2005.

[Sloman and Croucher, 1981] A. Sloman. and M. Croucher.

1981. Why robots will have emotions, The 7th Interna-tional Joint Conference on Artificial Intelligence, Van-couver, Canada, pp. 197-202.

[Sutton and Barto, 2000] R. Sutton, and A. Barto. Rein-forcement learning, an introduction. MIT Press, 2000.

688IT编程网

Motivated Agents

发表评论

推荐文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

688IT编程网

Motivated Agents

发表评论

推荐文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时 正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

非零金额正则表达式

半小时正则表达式