dispositions for sociable robotssw/publications/diplom_final... · 2007-09-11 · dispositions for...

Fachrichtung 4.7 Allgemeine LinguistikUniversitat des Saarlandes

Dispositions for Sociable Robots

Diplomarbeit

Angefertigt unter Leitung vonDr. Geert-Jan M. Kruijff und

Prof. Dr. Hans Uszkoreit

Sabrina Wilske

Mai 2006

Wilske, SabrinaDispositions for Sociable RobotsMaster Thesis,Saarland University, Saarbrucken, GermanyMay 2006, 87 pagesc© Sabrina Wilske 2006

Hiermit erklare ich an Eides statt, dass ich diese Arbeit selbstandig verfasst und keine anderenals die angegebenen Quellen und Hilfsmittel verwendet habe.

Saarbrucken, den 8. Mai 2006

Sabrina Wilske

Zusammenfassung

Eine große Herausforderung der Kunstlichen Intelligenz ist es, Roboter zu entwickeln, die sichso sozial verhalten, dass Menschen mit ihnen einfach und intuitiv interagieren und kommu-nizieren konnen. Die Losung dieses Problems erfordert es im ersten Schritt, Mechanismen undFahigkeiten zu identifizieren, die sozialer Interaktion zugrunde liegen und im zweiten Schritt, dieRoboter mit den Fahigkeiten auszustatten, die notig sind, um an solchen sozialen Prozessen teil-zunehmen. Ein wichtiger Aspekt dabei ist Adaptivitat – der Roboter muss in der Lage sein, seinVerhalten aufgrund seiner Erfahrungen zu andern. Thema dieser Diplomarbeit ist der Erwerbvon Dispositionen – Verhaltenstendenzen oder Preferenzen – als ein adaptiver Mechanismus,der soziales Verhalten ermoglicht. In der vorliegenden Arbeit entwickeln und implementierenwir ein Model fur Dispositionen, welches das Verhalten im physischen und sozialen Kontext be-stimmt. Der Roboter erwirbt Dispositionen fur Aktionen, die sich auf bestimmte Arten von Ob-jekten im raumlichen Umfeld beziehen; abhangig von der verbalen Bewertung anderer Agentenund dem Gelingen oder Scheitern dieser Aktionen werden Verhaltenstendenzen verstarkt odergeschwacht. Desweiteren erwirbt der Roboter Dispositionen, die sich auf sein kommunikativesVerhalten beziehen; auf der Grundlage von sprachlichem Wissen und abhangig von der Reaktionder Kommunikationspartner lernt der Roboter einerseits wie er indirekte Sprechakte interpretie-ren muss, andererseits wie er andere Agenten dazu bringen kann, ihm bei der Verfolgung seinerZiele zu helfen. Indem wir solche Beispielanwendungen fur Dispositionen realisieren, konnenwir zeigen, wie sie dazu dienen, soziales Verhalten zu ermoglichen.

Abstract

One of the great challenges in Artificial Intelligence is to build sociable robots that humanscan easily and intuitively interact with. This requires to examine the mechanisms and capabil-ities that underly social interaction and to provide robots with the necessary skills to take partin those social processes. A central issue is adaptivity – the robot must be able to change itsbehavior based on its experience. This thesis proposes the acquisition of dispositions – behav-ioral tendencies or preferences – as a general adaptive mechanism to enable social behavior. Thethesis develops and implements a model that enables dispositions to determine behavior in thephysical and social environment. A robot acquires dispositions for actions aimed at certain kindsof objects based on verbal feedback of a human and the success of those actions. Dispositionsare further applied to communicative actions: based on linguistic knowledge and depending onthe reactions of communication partners, the robot learns (a) how to interpret indirect speechacts and (b) how to engage others to help it in pursuing its goals.

By giving such exemplary applications of dispositional mechanisms the thesis shows howthey can be employed to promote social behavior.

ACKNOWLEDGMENTS

I want to thank my supervisors, my colleagues in the CoSy project, my family, friends and lovedones for their help, support, and being there.

i

Contents

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 42.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Anthropomorphism and Intentionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Individuality, Identity, and Personality . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.5 Affect and Emotion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.6 Empathy and Theory of Mind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.7 Autonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Approach on Dispositions 123.1 Learning and Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Dispositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2.1 Formal Definition of Dispositions . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Implementation 174.1 The Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.1.2 Communicative Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.1.3 BDI-Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Linguistic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.2.1 Combinatory Categorial Grammar . . . . . . . . . . . . . . . . . . . . . . . . . 214.2.2 Linguistic Meaning as Relational Structure . . . . . . . . . . . . . . . . . . . . 234.2.3 Ontological Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3 Raising Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.4 Acting in the Physical Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.4.1 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.4.2 Parameters and Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.5 BDI-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

ii

CONTENTS

5 Dispositions in Physical Context 345.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2 Deciding and Adapting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2.1 General Strategy for Playful Interaction . . . . . . . . . . . . . . . . . . . . . . 355.2.2 Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.2.3 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2.4 Dispositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.3.1 Single Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.3.2 Sequences of Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6 Dispositions in Communicational Context 476.1 Indirect Speech Acts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.1.1 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.1.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.2 Requesting Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.2.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.2.3 Formal Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.2.4 The System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

7 Conclusion 647.1 Recapitulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647.2 Discussion and Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.2.1 Interpretation of Indirect Speech Acts . . . . . . . . . . . . . . . . . . . . . . . 657.2.2 Requesting Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667.2.3 Actions in the Physical Environment . . . . . . . . . . . . . . . . . . . . . . . . 667.2.4 Experimental Evaluation with Naive Users . . . . . . . . . . . . . . . . . . . . 67

A Source Code Documentation 73A.1 Acting in the Physical Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

A.1.1 Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73A.1.2 Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74A.1.3 Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75A.1.4 Other Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76A.1.5 Agents and Central Processing Classes . . . . . . . . . . . . . . . . . . . . . . 77

A.2 Communicative Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78A.3 BDI Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

B Grammar specification 81B.1 Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81B.2 Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

B.2.1 Mass Nouns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82B.2.2 The Modifier Please . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82B.2.3 Transitive Verbs with different arguments . . . . . . . . . . . . . . . . . . . . . 83

iii

List of Figures

3.1 Gaining experience by perception and action . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1 Robot architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 ActivMedia PeopleBot with SICK laser and stereo vision on a pan tilt unit . . . . . . . . 194.3 Dependency graph for “Bring me a tea!” . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4 Dependency graph for “I like to have tea” . . . . . . . . . . . . . . . . . . . . . . . . . 254.5 Ontological types for classifying utterances . . . . . . . . . . . . . . . . . . . . . . . . 274.6 Process flow for actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.7 BDI model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.1 Strategy for playful interaction in the environment . . . . . . . . . . . . . . . . . . . . . 355.2 Disposition δ in physical context. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.3 Example 1: Blame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.4 Example 2: Praise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.5 Example 3: Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.6 Example 4: Success . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.7 Sequences of perceived feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.1 Flow diagram for interpretation of potential requests . . . . . . . . . . . . . . . . . . . 496.2 Process flow diagram for “Bring me a tea!” . . . . . . . . . . . . . . . . . . . . . . . . 536.3 Process flow diagram for “Can you bring me a tea?” . . . . . . . . . . . . . . . . . . . . 546.4 Process flow diagram for “I need a tea.” . . . . . . . . . . . . . . . . . . . . . . . . . . 556.5 Dialogue graph for help requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.6 Labeled dialogue graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.7 Parts of the robot architecture with focus on requesting help . . . . . . . . . . . . . . . 62

A.1 Agents and solvables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

B.1 Derivation for “I would like to have a tea” . . . . . . . . . . . . . . . . . . . . . . . . . 86B.2 Derivation for “Do you want me to help you?” . . . . . . . . . . . . . . . . . . . . . . . 87

iv

List of Tables

4.1 Mapping assertions to ontological types . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2 Mapping questions to ontological types . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.1 Mapping from evaluative utterances to ontological types . . . . . . . . . . . . . . . . . 37

6.1 Disposition δ for interpreting potential requests . . . . . . . . . . . . . . . . . . . . . . 52

B.1 Exemplary mapping from verbs in imperative mood to category . . . . . . . . . . . . . 84B.2 Exemplary mapping from verbs in interrogative sentences to category . . . . . . . . . . 85B.3 Exemplary mapping from verbs in indicative mood to category . . . . . . . . . . . . . . 86

v

List of Logical Forms

4.1 Bring me a tea! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Help me! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3 I would like to have a tea. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.4 Do you want me to help you? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.5 Yes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27B.1 Help me! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83B.2 Bring me the bucket to the lab! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84B.3 Can you help me? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85B.4 I would like to have a tea. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87B.5 Do you want me to help you? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

vi

Chapter 1

Introduction

My dream was filled with the wild excitement of seeing a machine act like a humanbeing, at least in many ways.

Woody Bledsoe

1.1 Motivation

Recent technological progress has paved the way for creating robots that can do more than sim-ply substitute industrial workers in the assembly line. The focus has shifted from robots as toolsto robots as partners that can interact with humans. As partners and assistants, robots becomepart not only of our immediate physical environment but also of our social system and its re-lations and rules. According to the “World Robotics 2005” survey1 there were more than 1.2million domestic service robots in use by the end of 2004 with more than 7 million new robots(for personal use) predicted for the end of 2008. Because the average user is not experiencedin robot programming, these robots need easy and intuitive interfaces. These interfaces shouldaccount for a human’s tendency to anthropomorphize technology and deal with complex tech-nology in ways that mimic human interaction (DiSalvo and Forlizzi, 2006). Endowing robotswith social skills can lead to more natural interaction and increase their usefulness and accep-tance. So the place robots can take in our social environment – provided we can accept andintegrate them – will depend on their social characteristics and capabilities. Work on sociablerobots is mostly based on the assumption that people prefer to interact with machines in thesame way as they interact with other people (Fong et al., 2003). This requires researchers firstto identify the mechanisms that underlie human social interaction and the necessary features,skills, and capabilities that enable humans to engage in social interaction. Second, researchershave to model these features and provide these skills to robots (Dautenhahn, 1995).

To give an overview about the problems related to social robots, we should start with adefinition. (Duffy, 2003) proposes that a social robot is a “physical entity embodied in a complex,dynamic, and social environment sufficiently empowered to behave in a manner conducive to its

1Published by United Nations Economic Commission for Europe (UNECE) and International Federation ofRobotics (IFR), available under http://www.ifrstat.org

1

1. Introduction

own goals and those of its community.” Another more specific definition is given in (Fong et al.,2003): “Social robots are able to recognize each other and to engage in social interactions, theypossess histories (perceive and interpret the world in terms of their own experience), and theyexplicitly communicate with and learn from each other.”

These notions entail a range of requirements, regarding the robot’s autonomy, its own iden-tity and other agents’ identities, emotional states, and social communicative skills, includingnatural language (Duffy, 2003; Fong et al., 2003). To be autonomous, the robot must act on itsown account, not controlled by a central control unit or some human operator. This requires arobot design that can choose which actions to take next, given perceptions, internal states, andpast experience. Social interaction is based on the ability of participants to perceive each otheras individuals with a unique identity and personality, this requires the display and perception ofcharacteristic behavioral tendencies. Social interaction also requires interactors to communicate,and the most expressive means in human-human communication is natural language, which isunfortunately not very precise. There is no one to one mapping between utterances and their con-textual interpretation – one utterance can have several interpretations, and one intention can beexpressed by different utterances. This inherent ambiguity can only be resolved by consideringthe utterance context and the specific language use of an individual. This requires adaptationalmechanisms, apart from static linguistic knowledge. Another important part of social interactionbetween humans is the expression and perception of emotional states. Social robots should ac-count for that. A common assumption is that interaction with robots is more natural if they canperceive and reason about the emotional state of their human interaction partners and if they canhave emotional states themselves and express them.

1.2 Objective

All of these shortly sketched issues require some sort of adaptational mechanisms. “A trulyautonomous robot is facing not only a complex control problem, but also a complex learningproblem, namely how to acquire complex behavior and continually adapt or extend it.” (Ziemke,1996). This thesis focuses on dispositions as a result of adaptational processes. Intuitively weunderstand disposition as a natural or acquired habit, preference, or characteristic tendency. Ac-quired dispositions as results of adaptational processes reflect the experience of an individualin its environment. This experience is used in the action selection process, i.e. in deciding whatto do next, when there is more than one option. This experience can also be employed to buildup models of other agents. These models can then be exploited to enable efficient and effectivesocial interaction. Efficient interaction on the one hand increases the usability and acceptance ofthe robot as a partner or servant, on the other hand it enables the robot to achieve its own goalsmore efficiently.

Humans see machines as social actors if they display intentionality – having goals and pur-suing them, which goes beyond merely reactive systems. Dispositions modulate intentions, bydetermining the goals that might or might not be pursued in a given context. Thus dispositionsplay a role for the perception of robots as intentional beings.

Dispositions constribute to the personality of an individual and thereby constitute part of itsidentity. They help to identify others as individuals which is an important condition for social

2

1.3 Overview

interaction. Dispositions display the personal history of an agent. The acquisition of a historyhas been pointed out to be necessary for social interaction by e.g. (Dautenhahn, 1997).

The thesis describes how dispositions are acquired while acting in the physical and interact-ing in the social environment. In the physical context autonomous actions are associated withinternal and external evaluations which have an effect on the robot’s preference for executingthese actions later. In the social context dispositions regard communicative actions. The thesisfocuses on two subproblems that arise in a collaborative setting, namely the understanding ofrequests and the successful employment of requests. Requests are often formulated indirectly,disguised as questions or assertions (“Could you bring me a coffee?”, “I am thirsty.”), whichcomplicates their appropriate interpretation. On the other hand, when trying to convince humansto help it in pursuing its goals, the robot needs to learn which forms of requests are most suc-cessful. On the one hand the robot learns how to interpret indirect requests, on the other handit learns how to engage another agent to help. This involves learning of dispositions of the hu-man agent - How does she express indirect requests? and What kind of requests will convinceher to help the me? and as a consequence leads to dispositions in the robot for interpreting andproducing utterances.

The overall impact of dispositions towards physical and social actions is shown in an in-tegrated model of beliefs, desires and intentions (BDI). This BDI model guides the robot’sautonomous behavior and determines which goals to follow and which actions to take on thebasis of a human’s utterances, the perceived physical environment, past experiences (= disposi-tions), and current desires. The model implements the influence of dispositions for the processof adopting goals to pursue.

1.3 Overview

The thesis is structured as follows. Chapter 2 characterizes the problem and the involved is-sues in more detail and presents related work. Chapter 3 introduces the general approach ondispositions and sets the theoretical framework which serves as the common ground for the dif-ferent applications of dispositions. Chapter 4 prepares the detailed descriptions of dispositionacquisition in Chapter 5 and 6 by introducing the robot architecture in which the work is em-bedded and preparing the linguistic background for communicational dispositions. It presentsthe mechanisms used for dispositions in physical context and the BDI model which integratesdispositions into the overall behavior of the robot. Chapter 5 describes the acquisition of dispo-sitions in the physical environment, and Chapter 6 introduces dispositions for communicativeactions. Chapter 7 summarizes the thesis and evaluates the approach by pointing out advantagesand shortcomings and indicating open issues and possible extensions of the work.

3

Chapter 2

Background

SummaryThis chapter gives an overview of the issues that arise when attempting to build sociablerobots. It introduces the problems to be addressed, shows how they can be addressed and setsthis in relation to the approach taken in the thesis.

2.1 Introduction

In order to build sociable robots, which people can easily and intuitively interact with, we haveto examine the mechanisms and capabilities that underly social interaction and communication.After identification of the requirements, the objective is to enable robots to participate in so-cial interaction by providing them with the appropriate features and skills and by modeling theunderlying processes. Of course the problem can be approached from different sides, focus-ing on different aspects, which might interleave. This thesis focuses on mechanisms to acquiredispositions and let those dispositions determine and promote the social behavior of the robot.Dispositions are natural or acquired habits, preferences, or characteristic tendencies that have aneffect on the behavior of an agent in its environment. Dispositions are a result of adaptationalprocesses, they have an effect on anthropomorphism, intentionality, individuality and personal-ity. Dispositions also relate to communicative skills, emotion, and empathic behavior in a robot.The following subsections introduce these issues and indicate their relation to dispositions.

2.2 Anthropomorphism and Intentionality

Anthropomorphism is the tendency to attribute human characteristics to inanimate objects, ani-mals or others (Duffy, 2003). When people anthropomorphize, they attribute cognitive or emo-tional states to an entity in order to rationalize the entity’s behavior in a given social environment.This is closely connected to Dennett’s intentional stance – “the strategy of interpreting the be-havior of an entity by treating it as if it were a rational agent who governed its choice of action

4

2.3 Individuality, Identity, and Personality

by consideration of its beliefs and desires” (Dennett, 1987).1

The problem in building robots is how to exploit and promote humans’ tendency to anthro-pomorphize and interpret others’ behavior as intentional. While building upon these tendencies,one should beware of feeding false expectations that cannot be fulfilled (Duffy and Joue, 2005):“Anthropomorphism is only useful if it does not complicate people’s expectations”.

The problem with anthropomorphism is that it is not clear, how many and what kinds ofhuman features are necessary to support the impression of an agent having cognitive or emo-tional states. A question that arises in terms of intentionality is whether the robot would actuallyhave to hold intentions, and if not, what kind of behavior would be required to enable humanobservers to ascribe intentions. User studies with the relatively simple vacuum cleaner robotRoomba show that people perceived the robot as an intentional social agent and sometimes evencreated social relationships with it, even though the robot used only simple heuristics to followrandom motion patterns and is not capable of learning or planning (DiSalvo and Forlizzi, 2006).

Although people have no difficulty with attributing intentions to artifacts that do not actuallyhold intentions, we contend that intentions are needed to ensure a certain degree of complexityin an agent’s behavior, especially when interacting with other agents. Assuming that an agentneeds to hold intentions in order to interact meaningfully with its environment, we would needmechanisms to derive actions and behavior from those intentions. Dispositions are such a mech-anism, they determine the behavior of an agent by taking into account past experience of itsinteraction in the environment. Dispositions modulate intentions of the robot, and can determinewhich goals to adopt as we will show in §4.5. Thereby they provide a level of complexity thatgoes beyond purely reactive patterns.

2.3 Individuality, Identity, and Personality

Social interaction requires that interaction partners perceive each other as individuals with aunique identity and personality (Duffy, 2003; Fong et al., 2003). For interaction between robotand human, this entails two problems. (1) The human must perceive the robot as a social partnerwith individual traits rather than merely an exchangeable physical artifact. (2) The robot shouldaccount for differences between humans and adapt its behavior to their different personalities.

Personality of the robot Research results in Human Computer Interaction suggest that thefirst problem is not too hard to address. (Reeves and Nass, 1996) evoked personality in a com-puter by very simple means: the style of interaction language and the sequencing of interaction.Computers were perceived as more dominant when they used a very confident language style,expressing comments as assertions and commands, whereas computers that used unassertive lan-guage – comments expressed as questions or suggestions – were perceived as more submissive.In addition to language style, the sequencing of interaction was an important feature: The com-puter that took initiative in interaction with the human was judged as more dominant than the

1All this roots back to the Philosophy of mind, which concerns the exact nature of mind, mental events, andconsciousness, and their relationship with a physical body.

5

2. Background

computer that left the initiative to the user.2 People seem to automatically extrapolate when givenonly a little hint, but features indicating personality need to be consistent, because ambiguouscharacters might be disliked or not even be recognized (Reeves and Nass, 1996). Transferringthese findings from computers to robots, we could give robots a personality by determining howmuch initiative they take or how confidently they communicate. Dispositions are a mechanismto set those parameters, grounded on past experience. The robot acquires a personal history andindividuality by experiencing the world and behaving according to its individual experiences.Positive experience can raise its confidence level. Although it is beyond the scope of this thesisto implement a model for the acquisition of personality, we will show in Chapter 6 how therobot’s style of communication can be influenced by the collected feedback it gets from interac-tion partners.

Personality of the human As interaction is a reciprocal process, the robot as well should beable to recognize and act according to different personality features of the human. This requiresmechanisms to acquire models of different humans and different policies relating to those mod-els. While there is a lot of work on user modeling on non-robot systems, (see (Jameson, 2003)for an overview and (Heylen et al., 2005) for an example system), there is, to our knowledge,only one practical approach to such mechanism on a robot. (Mitsunaga et al., 2005) describehow a robot autonomously adapts its behavior to perceived body signals of a human with thegoal to minimize the discomfort of the human. It reacts to individual preferences of a humanregarding personal space and gaze meeting.

Dispositions constitute another approach to adapting to individual features, as they reflectthe experience of the robot with a particular agent. They do not provide explicit user modeling,but they account for perceivable differences between users as those differences lead to differentbehavioral tendencies. Chapter 6 presents examples for this.

2.4 Communication

Social interaction is enabled by communication – the exchange of information via a sharedprotocol. Human communication relies on a wide range of communicative means. Besides non-verbal means like posture, gesture, gaze, and facial expressions the most informative is probablynatural language. In order to facilitate human-robot social interaction the robot should be able toexploit at least partially some of these means.

Enabling the robot to linguistically interact with humans in their language requires a lot ofknowledge at different levels to be formalized. Using natural language presumes the ability torecognize and produce speech signals. The acoustic signals3 perceived when someone is talkinghave to be translated into a string. Using syntactic and semantic knowledge, this string is thenanalyzed for its linguistic meaning. Based on that meaning, a perceived utterance has to beinterpreted with regard to the situational context. In the other direction, in order to produce

2The primary result of those experiments was that people liked computers more if they displayed a personalitysimilar to their own. A further experiment showed that people liked computers even more if they changed duringinteraction to conform to their respective personalities (Reeves and Nass, 1996).

3We are aware that sign language does not rely on acoustic but on visual signals.

6

2.5 Affect and Emotion

an utterance, a meaning has to be mapped to a well-formed string and this string has to betranslated into acoustic signals. The coupling of utterance meaning to situational context requirespragmatic knowledge – knowing how to do things with words; understanding direct and indirectspeech acts (Levinson, 1983). Knowing when to speak and when to listen – the rules of turn-taking enable the robot to take part in dialogue – “a joint process of communication”, a form ofsocial interaction (Breazeal et al., 2004; Fong et al., 2003).

One problem for a computational approach to natural language is ambiguity – an utterancecan have more than one meaning. A reciprocal problem is synonymy – one meaning can beexpressed in different ways. Ambiguity can often be resolved by looking at the context of the ut-terance: what was said before and who produced the utterance in which situation? Language usecan differ between different individuals. Grounded on the basic linguistic knowledge (lexicon,syntax, semantics), the robot needs mechanisms to resolve ambiguity and to select one utteranceto express a meaning. Those mechanisms enable it to address the situation- and user-dependentvariability in language use.

Dispositions provide such an adaptive mechanism to learn user- and context-specific produc-tion or interpretation. They can account for non-fixed, ambiguous pieces of conversation as wewill show in Chapter 6. Dispositions can regard the production of utterances or the interpretationof and reaction to utterances of the dialogue partner. In this thesis dispositions are used for twoproblems. One is the resolution of ambiguous speech acts: The robot acquires a disposition onhow to react to human utterances that are possibly indirectly formulated requests. The robot caneither interpret them as requests and fulfill them or it can understand them literally and producean appropriate answer, see §6.1. Second, the robot employs dispositions to engage other agentsto help. By assessing the reaction of the other agent, it learns which communicative actions aremost effective for its purposes. §6.2 shows how the robot adapts the production of utterancesdepending on the observed characteristics of the other interaction partner and thereby addressesthe variability between language users.

Dispositions and the dialogue process influence each other: Dispositions are determined bythe dialogue and they themselves determine the dialogue. Dispositions can be used to enablemore effective and efficient communication and they provide situation-dependent flexibility.

There are several examples for robotic systems with natural language interfaces and dialoguesystems (Breazeal et al., 2004; Theobalt et al., 2002; Severinson-Eklundh et al., 2003; Sidner andDzikovska, 2005). However, the dialogue systems described in those papers work on a fixed baseof linguistic and conversational knowledge. None of them has an adaptational component thatcan change interpretation or production strategies according to experience.

2.5 Affect and Emotion

Affect and emotion play an important role for attention, perception, cognition, and interaction insituated context. Human experience, behavior and decision making can hardly be detached fromemotional states which are grounded in our bodies.4 Emotions can have a beneficial effect on

4(Damasio, 1994) showed that intelligent behavior of humans can be impaired by a lack of emotions. Damasiofound that without the help of the underlying mostly unconscious emotional evaluation of our options, humans wouldhave severe problems to decide for actions to take considering the complexity and amount of possible consequences.

7

2. Background

creative problem solving, memory retrieval, decision making and learning (Breazeal, 2004). Theunderlying assumption is that extending pure rational mechanisms in the robot with emotionalmodels could emerge in a more human-like robot and facilitate social interaction with humans.

When trying to model emotions for robots, the problem is how to ground emotional statesof the robot in its external perceptions and internal states. Perceptions must have a meaning tothe robot. As (Dautenhahn, 1998) points out: “decision making only matters when the potentialoutcomes have an individual, subjective meaning to the decision maker”.

Designing emotional states for robots thus requires plausible models that link external andinternal states to emotional states. Those emotional states should then have an impact on actionsand behavioral tendencies. Dispositions relate to emotional models in that they control the be-havior of the robot based on experience of subjectively evaluated stimuli from the environment.However, instead of providing a sophisticated emotional model, we use positive and negativestimuli as reinforcers of behavior, as illustrated in Chapter 5.

Emotional models for robots have been used to select actions, to ground the meaning ofphysical objects of the environment, to communicate the internal state of the robot to otheragents, to support anthropomorphism, and to enable socially acceptable and effective behavior.

In the Cathexis model described in (Velasquez and Maes, 1997) and (Velasquez, 1998) emo-tions – or rather memories from past emotional experiences – serve as biasing mechanism fortaking decisions during the action-selection process. (Arkin et al., 2003) implement an emotionalmodel for the purpose of action selection in entertainment robots. Their model evaluates exter-nal stimuli and internal drives and applies homeostasis regulations to select a behavior. Theyalso use emotions to ground the meaning of symbols in the environment: a physically groundedsymbol is associated with the change of internal variables when the robot applies a behavior inresponse to the object.

(Canamero and Fredslund, 2000) built a Lego robot that was able to express six distinct emo-tional states with a relatively simple face consisting of two movable eyebrows and lips besidesfixed eyes and nose. Emotional states are evoked by stimulation patterns of tactic sensors trig-gered by human interactors. The authors found that people performed quite good in recognizingthe emotional state of the robot and that they seemed to anthropomorphize and empathize withthe robot quite easily.

The emotional system implemented in the Kismet and Leonardo robots is responsible forperceiving and recognizing internal and external events with affective value, regulating the cog-nitive system to promote appropriate, and flexible decision making (Breazeal, 2004). The emo-tional system is also used to communicate the robot’s internal state to other agents in order tosocially regulate their behaviors in a beneficial relation to the robot and thereby enables sociallyacceptable and effective behavior in interaction with people. Together with the cognitive system,it serves social and self-maintenance functions. It also “implements the personality of the robotwith its attitudes and behavioral inclinations toward the events it encounters”, (Breazeal, 2004).

He proposes the Somatic Marker Hypothesis, which claims that decisions that are made in circumstances similarto previous experience, and whose outcome could be potentially harmful, or potentially advantageous, induce asomatic response used to mark future outcomes that are important to us, and to signal their danger or advantagesubconsciously. (Ventura, 2000) proposes a computational model for implementing the Somatic Marker Hypothesis,but this model was not tested yet on real robots.

8

2.6 Empathy and Theory of Mind

The acquisition of dispositions based on past experience can be considered as a simplifi-cation of emotional models. Similar to emotions dispositions arise from the history of valuedperceptions from the environment and have the purpose to control the action selection process.Dispositional mechanisms differ from emotional models in their complexity; whereas dispo-sitions in this thesis translate positively or negatively valued stimuli from the environment intobehavioral tendencies, (cf. Chapter 3), emotional models often distinguish between several emo-tions (based on humans’ emotions) that interact in a more complex way. Dispositions are onlyimplicitly expressed by behavior tendencies, they are not explicitly shown as emotional states asin some of the above implementations.

2.6 Empathy and Theory of Mind

Social interaction depends upon the recognition of other points of view and the understanding oftheir mental states (Scassellati, 2002). Those abilities are known under the terms Theory of Mind,(Baron-Cohen, 1995) and Empathy. Having a theory of mind refers to the ability to understandthat others have beliefs, desires and intentions that are different from one’s own and to reasonabout them. The concept of empathy as the ability to imagine oneself in the position of anotherperson that gives us access to the person’s mental states is very similar to that, but puts moreemphasis on the emotional aspect.

Artificial empathy or theory of mind pose a problem because it is not clear how they aregrounded in perception and proprioception. When trying to make robots empathic, one has tofind the minimal requirements for modeling those psychological mechanisms.

(Dautenhahn, 1997) argues that empathy requires remembering processes that reconstructexperiences on the basis of current situations, with the body as point of reference. She dis-tinguishes two levels of empathy: (A) the bodily re-experiencing, resonance and reaction toemotional states, and (B) the biographic reconstruction that considers the biographic history ofothers. Dautenhahn emphasizes the role of embodied experience and “experiential understand-ing” for empathy and argues that those cannot be described by a module, or a static symbolicconcept, but only “by dynamic mechanisms of resonance and synchronization”. However sheoffers no practical implementations or models of those rather vague concepts.

Other less abstract approaches to empathy and theory of mind for robots, which are groundedon biological and psychological models are given by (Scassellati, 2002) and (Kozima et al.,2003). They phrase requirements in terms of perceptual and motor skills that serve as precur-sors to the more complex theory of mind capabilities – attentional mechanisms for finding eyesand faces, follow gaze, and recognize others’ actions. In addition to enabling the robot to detectfaces and eyes and recognize and follow the gaze of humans, (Scassellati, 2002) tries to distin-guish animate from inanimate motion; (Kozima et al., 2003) try to enable a robot to recognize ahuman’s bodily movements and map them onto its own proprioception.

While the biologically inspired approaches try to solve the empathy problem in a bottom-upmanner, others try to reason about other agents’ beliefs, desires and intentions in a top-down,more rational way, without using the own body as reference. (Heylen et al., 2005) present atutorial dialogue system that takes into account students’ characters and an appraisal of theiractions to build a hypothesis about their affectual state. This hypothesis is then used to plan

9

2. Background

the actions and responses of the tutor agent. Even though this might not reflect humans’ actualempathic processes, reasoning about other agents’ mental states on a rather rational level can alsoyield the desired empathic behavior, (limited to the extent that is expected from non-humans).

The difference between the biological and the rational approach suggests to make a dis-tinction between the possible foundations of empathic behavior. One possible base of empathicbehavior is the automatic and unconscious feeling for others, another possible base is the ratio-nal and logical reasoning about the states of others.5 A third possible way to yield aspects ofempathic behavior is a simple adherence to principles of politeness and courtesy. Based on theirobservation that humans display polite behavior toward computers, (Reeves and Nass, 1996)argue that computers themselves have to be polite: “It’s not just a matter of being nice, it’s amatter of social survival.” It does not require any sort of empathic reasoning to program a robotthat apologizes if it cannot fulfill a request, but such a robot might be perceived as much moreagreeable than a robot that bluntly refuses.

Dispositions in this thesis do not address the requirements of the biological view on empathy,nor do they provide explicit reasoning about mental states of other agents. However they arerelevant for empathic behavior in the following respects.

For acquiring disposition on help requests (§6.2), empathy is concerned to the extent thatthe robot chooses a dialogue action that is better suited to convince the other. This requires noexplicit BDI-reasoning, but it results in a behavior that is adapted to individual characteristicsand preferences. The robot can be said to empathize with the human in that it chooses the mostagreeable action.

Dispositions for understanding indirect speech acts relate to empathic behavior to the extentthat the robot has to reason about the motives and desires of another agent because the agent doesnot express them explicitly. §6.1 shows that in order to understand if an utterance was meant asa request, the robot has to take into account the situation and past experience in that situation.

As the acquisition of dispositions and acting according to them is a dynamical process, itaddresses the requirement of a dynamic mechanism posed by (Dautenhahn, 1997). We arguethat dispositions are a result of embodied experience and a form of experiential understandingprocesses, which Dautenhahn considers as central to empathic processes.

2.7 Autonomy

Social robots have to act autonomously in their environment. The degree of the desired auton-omy is determined by the social role that the robot plays, and the capabilities and requirementsexpected from others. When acting as a servant and assistant, should it wait for explicit requestsor autonomously look for jobs to do? When talking to people should it only react or also initiatedialogues? Autonomy means that the robot is neither controlled by some overall central controlstation physically detached from the robot, nor by human operators – in these cases there wouldbe no need for social abilities (Dautenhahn, 1995). An important issue for autonomy is mobility

5However, cases of very intelligent autists indicate that intellectual skills alone cannot provide the ability to“empathically feel what is going on inside another human”, (Dautenhahn, 1997). Those people are aware of theirimpairments in social skills, but sometimes manage to cope with human social life by using visual cues and socialrules deduced from observation of people or study of literature.

10

2.8 Conclusion

– the ability to move and self-navigate determines the robot’s autonomy in space (Siegwart andNourbakhsh, 2004).

Autonomy implies that the robot has to develop the capacity to interact independently andthat its own capabilities and the social context allows it to do so. This necessitates mechanismsto provide autonomy and flexibility in various, unpredictable situations.

As illustrated in the following chapters, dispositions are a means to provide autonomy, be-cause they introduce a more complex relation between perceived states of the environment andexecuted actions. Without them, the robot would only react or choose actions randomly. Further-more, dispositions can provide a form of introspection in that they allow the robot to evaluatethe actions it can execute in a state. Although the thesis provides no scenario in which the robotactually reasons about its preferences or aversions for certain actions, the approach we are takingwould allow for dispositions to become subject of reasoning processes and conversation.

2.8 Conclusion

The preceding sections of this chapter presented a selection of relevant issues for designing so-ciable robots and prefigured how this thesis is going to address them. Most of these issues relyon mechanisms to adapt and learn. As (Duffy, 2003) puts it: “The degree of social interaction isachieved through a developmental and adaptive process.” Intelligent behavior as such requiresadaptational skills. By its ability to change itself as a result of its experience, an intelligent agentcan adapt to its environment or acquire some kind of knowledge. Dispositions provide an adap-tational mechanism by relating experience to behavioral tendencies. Traditionally, adaptationalmechanisms for robots were employed for learning how to act most efficiently in the physicalworld (e.g. navigating or manipulating objects) or how to predict stimuli coming from the en-vironment. Adaptation in the context of sociability primarily refers to the acquisition of socialskills to enable and facilitate interaction with other agents. Adaptational mechanisms can beused for building a personality and for recognizing the personality of others. They can extendcommunicational skills based on fixed linguistic knowledge by addressing the variability of lan-guage. They are important for affect and emotion and they promote empathic behavior. Thisthesis provides adaptivity by letting a robot acquire and act according to dispositions. It doesnot present a complete solution for all the sketched problems, but rather a possible direction tofollow.

11

Chapter 3

Approach on Dispositions

SummaryThis chapter describes the approach to acquiring and using dispositions. It characterizes the

acquisition of dispositions as an adaptational process between the agent and its environment.

3.1 Learning and Adaptation

Mechanisms for learning and adaptation are a prerequisite of intelligent systems. “AI is the sci-ence of endowing programs with the ability to change themselves for the better as a result oftheir own experiences” (Schank, 1987). Figure 3.1 gives an abstract and very general illustra-tion of the processes that determine adaptation, learning, and experience. An agent perceivesthe environment and decides what action to take, based on its perception and on its previousexperience . The action together with the consequent perception of the environment will modifythe experience. Next time the robot has to decide on an action, it takes into account the memoryof previous actions together with their perceived consequences.1

There are several different definitions of learning and adaptation, three examples are givenbelow:

1. “Modification of a behavioral tendency by experience” (Webster, 1984)

2. “A learning machine, broadly defined, is any device whose actions are influenced by pastexperience” (Nilsson, 1965)

3. “Learning produces changes within an agent that over time enable it to perform moreeffectively within its environment.” (Arkin, 1998)

Definitions of learning seem to fall into two groups. Definitions 1 and 2 restrict learning to thechange of behavior or behavioral tendencies based on past experience. Definition 3 addition-ally comprises the goal-directedness of these processes; it says that learning results in a better ormore effective performance of the learning agent in its environment. Learning processes in terms

1Figure 3.1 is a variation of the action-perception cycle introduced by (Neisser, 1976).

12

3.2 Dispositions

Figure 3.1: Gaining experience by perception and action

of 1 and 2 can only be measured in terms of internal parameters like convergence, stability, andlearning rate. However, this entails no evaluation of how good the learning is, as there is no exter-nal measure. The third definition entails that learning processes can be evaluated and measuredgiven appropriate performance metrics. For evaluating learning, we need to introduce perfor-mance measures relating to the goal of the learning process. Goals relate to the application andpurpose of learning. In supervised learning – the learning from examples provided by a knowl-edgable external supervisor – learning can be measured in terms of accuracy and correctnessof the classified examples. In reinforcement learning – learning behavior through trial-and-errorinteractions with a dynamic environment – the goal is to maximize the received reward from theenvironment in the long run. Performance of learning is then measured in terms of the obtainedreward. The next sections show that dispositions are closely related to the reinforcement learningparadigm.

3.2 Dispositions

Dispositions understood as natural or acquired habits, preferences, or characteristic tendenciesdetermine the behavior of an agent in its environment. Acquired dispositions are a result ofadaptive processes. They reflect the experience of an individual in its environment and determinethe behavior of this individual. Given the perceived state of the environment and a set of possibleactions for that state, the disposition controls the selection of which action to take. It mapsfrom experience of the agent (regarding the state of the environment and possible actions) toprobabilities of selecting an action.

Dispositions are an approach to solving the problem of adapting and learning from inter-action with the environment. They differ from other learning mechanisms like reinforcement

13

3. Approach on Dispositions

learning, neural networks or evolutionary learning, in that they make the result of the adaptationprocess more explicit, and thus introspectable. They provide the possibility to explicitly take theadaptation results into account in the deliberative process.

3.2.1 Formal Definition of Dispositions

Dispositions provide an approach to adapt and learn from interaction with the environment. Theagent, which learns and decides which actions to take, interacts with its environment, whichis everything outside itself. The interaction between agent and environment is continual – theagent acts and the environment reacts. At each time step t, the agents perceives a state, st ∈ S,where S is the possibly infinite set of possible states, and chooses one of the possible actionsat ∈ A(st), where A(st) is the set of possible actions available in st . The environment “reacts”on this action by transition to a new state, st+1 and the agent perceives the new state. Thenew state conveys feedback, fbt(s, a) from the environment - an interpretation (properties andfeatures) of the environment’s state st+1 that the agent can relate to its action at executed instate st . The set of received feedback from the environment makes up the agent’s experience.The agent’s experience regarding the execution of action a in state s, expk (s, a) captures thefeedback from all previously perceived states following the execution of action a in state s :fb1 (s, a), fb2 (s, a), . . . , fbk−1 (s, a), fbk (s, a).

The agent’s disposition, δs , is a mapping from experience exp(s) to probabilities of selectingeach possible action in state s, where δs(exp, ax ) is the probability that the agent chooses ax ∈A(s) given its experience exp =

⋃a∈A(s)exp(s, a). Note that these probabilities have to sum

up to 1 :∑

a∈A(s)δ(exp, a) = 1.The problems are then, first how to gain experience, i.e. how to build up experience from

feedback by defining a function update that updates the experience of an agent given its past ex-perience and the current feedback: expk (s, a) = update(fbk (s, a), expk−1 (s, a)).2 This prob-lem is the learning problem. The second problem is how to define the disposition function δ –the control problem.3

Gaining Experience

There are several possibilities to build up experience from the environment’s feedback. Theeasiest way would be to simply assign the current feedback value:

expk = fbk (3.1)

(with expk as short for expk (s, a) and fbk short for fbk (s, a)). This would result in a ratherforgetful agent who takes into account only the last feedback it gets and forgets about all previous

2It would also require to define the union of experience regarding two distinct actions a1 , a2 ∈ A(s) :exp(s, a1 ) ∪ exp(s, a2 ), but due to the simplicity of the following applications, this problem does not arise inthe scope of this thesis.

3Note that the problem specification for dispositions closely follows the reinforcement learning (RL) problemspecification given in (Sutton and Barto, 1998). The reward from RL relates to the feedback and the RL-policy isconnected to disposition. The difference is that in RL, an agent’s policy changes depending on the reward it gets,whereas dispositions do not change, they are invariably defined with respect to changeable feedback.

14

3.2 Dispositions

feedback. For taking into account more than only the last feedback, the agent could take theaverage of all previous feedback, expressed by Equation 3.2:4

expk =fb1 + fb2 + ... + fbk

k(3.2)

This would require storing all experiences explicitly which takes memory and growing compu-tational effort. An incremental update formula would be more efficient:

expk+1 =1

k + 1

∑k+1i=1fbi (3.3)

=1

k + 1(fbk+1 +

∑k+1i=1fbi)

=1

k + 1(fbk+1 + k expk + expk − expk )

=1

k + 1(fbk+1 + (k + 1) expk − expk )

= expk +1

k + 1[fbk+1 − expk ]

Equation 3.3 calculates the average of all feedback ever received. This is appropriate for en-vironments that never change. If we assume that the environment can change over time, it ismore appropriate to weigh recent feedbacks more heavily than past ones. Instead of the stepsize parameter 1

k+1 in Equation 3.3, which changes from step to step, we take a constant α,0 < α ≤ 1

expk+1 = expk + α[fbk+1 − expk ] (3.4)

Note that this also holds for k = 0, yielding exp1 = fb1 for arbitrary exp0 , and thus is appli-cable for cases when no previous experience is given. Current feedback is given more weightthan past feedback when α is greater, whereas if α is small, past feedback is given more weightrelative to current feedback. Note that the constant step size parameter in Equation 3.4 preventsthe convergence of the learning system. But this is actually desired for changing (non-stationary)environments (Sutton and Barto, 1998).

Different scenarios for the acquisition of different dispositions involve different kinds offeedback and also different update functions. These will be defined in Chapter 5 and Chapter 6.

3.2.2 Goals

The objective of learning dispositions is two-fold. At a higher level the goal is to provide socialbehavior of an agent and efficient and effective interaction. This requires that the agent learns toadapt to other agents and their particularities. Learning models of other agents enables cooper-ative behavior and the effective pursuit of own goals. In this thesis, we employ dispositions forcommunicative actions for the appropriate interpretation of indirect requests and for the produc-tion of utterances that convince other agents to help. We further employ dispositions for actions

4Equation 3.2 to Equation 3.4 are inspired by equations for action value estimation given in (Sutton and Barto,1998)

15

3. Approach on Dispositions

in the physical environment: the robot acquires a history of interactions with potential toy ob-jects and depending on the effect of its actions and verbal evaluations of other agents the robotwill seek or avoid interaction with those objects. In this case the high-level objective is a robotwho is sensitive to praise or blame and to experiences of frustration or success.

The pursuit of those higher level objectives is enabled by establishing lower level objectives.These lower level objectives relate to the feedback received from the environment. Under theassumption that feedback is valued, i.e. that some feedback is more desirable than some other,the agent’s goal is then to maximize positive feedback and minimize negative feedback. Thedisposition δs should be designed such that the received feedback is as high as possible basedon the expectations derived from experience.

Dispositions are a simplified version of reinforcement learning. They have in common withreinforcement learning that the agent chooses actions based on valued experience arising frominteraction with the environment. The difference between reinforcement learning (RL) algo-rithms and dispositions is, that RL algorithms use value and policy estimation to iterativelychange policies in order to finally yield an optimal policy, whereas dispositions as introducedin this thesis are statically defined in relation to changeable experience. For the applicationsdescribed in Chapter 5 and 6, the initial policy is important, i.e. how to behave with no given ex-perience, and the behavior stabilizes relatively fast. Typical reinforcement learning applicationsare more or less independent of the initial policy and at the same time they usually require anotable amount of exploration runs. While it would not be impossible to achieve similar resultswith reinforcement learning algorithms, we assume that the less flexible and more constrainednotion of dispositions applied in this thesis provides a useful and admissible simplification foraddressing the adaptation problem.

From the perspective of an agent whose behavior is determined by its beliefs, desires, andintentions, dispositions are one factor that is taken into account for the decision of what todo next. Given a set of prioritized desires, and a state of beliefs that contains perceptions of thephysical environment and verbal utterances of other agents, the robot derives its intentions underinfluence of its dispositions. §4.5 presents a model of those interdependencies.

3.3 Conclusion

This chapter gave an introduction to the acquisition of dispositions as an adaptational process bywhich the robot gains experience based on its actions and perceptions of the environment’s feed-back to those actions. The experience is then used to select future actions – dispositions expressthe probability of choosing one of a set of possible actions. In order to direct the acquisition ofdispositions, we evaluate the feedback and derive dispositions that maximize the expected valueof future feedback.

After introducing the embedding robot architecture and describing implementational foun-dations for communicating in the social environment and acting in the physical environment inChapter 4, we apply the rather abstract characterization of dispositions in this chapter in practicalapplications for actions in the physical (Chapter 5) and the social environment (Chapter 6).

16

Chapter 4

Implementation

SummaryThis chapter sets the grounds for the following two chapters in which we describe how dispo-sitions are used to act in the physical world and to communicate with other agents. The chap-ter starts with an introduction to the robot architecture which embeds the implementations fordispositions (§4.1). After giving an overview in §4.1.1, it focuses on the subsystems for BDI(§4.1.3) and communication (§4.1.2). §4.2 presents the steps of linguistic analysis that enablethe communication between robot and humans. §4.3 describes how the architecture enablesthe robot to initiate subdialogues, a capability that is required for the communicative behaviordescribed in Chapter 6. §4.4 describes the mechanisms and algorithms that are relevant for ac-tions in the physical environment. The chapter ends with characterizing the model of beliefs,desires and intentions, which integrates the dispositions for physical and communicationalactions in §4.5.

4.1 The Architecture

4.1.1 Overview

The work of this thesis is embedded in a distributed robot architecture that enables a robotto move about in an indoor environment, recognize simple visual scenes, and communicatewith a human about visual or spatial aspects of the environment. The functionality of the archi-tecture was tested on an ActivMedia PeopleBot equipped with a SICK laser range finder andbumper sensors, see Figure 4.2. The architecture (illustrated in Figure 4.1) comprises differentsubsystems according to different sensorimotoric and cognitive modalities. We have subsystemsfor communication, spatial localization & mapping, and visual processing. We use the BDI-subsystem as a mediator between the other subsystems. This is based on the idea that beliefsconstitute the common ground between modalities, rather than being a layer on top of the differ-ent modalities. The subsystems and their components communicate with each other via the OpenAgent Architecture (Cheyer and Martin, 2001). The next two sections give more details aboutthe communication subsystem and the BDI-subsystem as these are relevant for the dispositionmechanisms described in the next chapters.

17

4. Implementation

Figure 4.1: Robot architecture

4.1.2 Communicative Subsystem

The communication subsystem consists of several components for the analysis and productionof natural language. The first step of analysis is speech recognition for which we use the Nu-ance speech recognition engine1 with a domain-specific speech grammar. In the second step thestring-based output of Nuance is parsed with OpenCCG.2 OpenCCG uses a combinatory cate-gorial grammar (Baldridge and Kruijff, 2003) to yield a representation of the linguistic meaningfor the recognized utterance. The third step is to analyze how the recognized utterance relates tothe current dialogue context. This requires to identify the rhetorical and referential relations topreceding utterances, and it yields an updated model of the situated dialogue context (Asher andLascarides, 2003; Bos et al., 2003).

When the need to communicate arises from the current dialogue flow or from another modal-ity, the dialogue planner establishes a communicative goal. The content planner then derives aplan to achieve this goal in consideration of the current dialogue context, possibly in a multi-modal way using non-verbal (pose, head moves) and verbal means. Verbal content is realized bythe OpenCCG realizer, which generates a string for the utterance, and a text-to-speech engine3,which synthesizes this string.

1http://www.nuance.com2http://openccg.sf.net3http://mary.dfki.de

18

4.1 The Architecture

Figure 4.2: ActivMedia PeopleBot with SICK laser and stereo vision on a pan tilt unit

19

4. Implementation

4.1.3 BDI-Subsystem

The BDI-subsystem contains different components that relate to beliefs, desires, and intentions.The belief state contains beliefs arising from perceptions of different modalities. Those beliefsare used to mediate and arbitrate between different modalities. The action planner creates actionplans based on goals that arise from intentions and beliefs. §4.5 introduces a model for deriv-ing those goals. For a small subset of goals the action planner employs an external planner.45

Another part of the BDI-subsystem is concerned with dispositions in general and their use foradapting dialogue strategies in particular. The module for dialogue strategies indicates the needto communicate along with specific constraints for the realization, (e.g. whether to employ acommand or a question) to the communication subsystem via the BDI mediator, see §6.2.4.

4.2 Linguistic analysis

This section describes the grammar resources that we employ for the acquisition of communi-cational dispositions. The robot relies on them to understand requests, clarify the meaning ofambiguous utterances, and to engage other agents to help. We implemented the grammar in theCombinatory Categorial Grammar formalism (Baldridge and Kruijff, 2003), which we introducein §4.2.1. We express meaning as ontologically rich, relational structures. These structures areinspired by description logic-based knowledge representations, and dependency grammar-baseddescriptions of meaning.

The utterances that the robot needs to understand and produce can be roughly divided intocommands, questions, assertions, and answers or cues. The following examples illustrate thetypes of utterances that we consider.

(1) Commandsa. Bring me a tea!b. Help me!

(2) Questionsa. Can you bring me a tea?b. Could you bring me a coffee, please?c. Can you help me?d. Do you want me to help you?

(3) Assertionsa. I like to have tea.b. I would like to have a tea.c. I need tea.d. I am thirsty.

(4) Answers/Cues4The Fast Forward planning system (http://www.mpi-sb.mpg.de/∼hoffmann/ff.html)5Thanks to Michael Brenner from Albert-Ludwigs-Universitat Freiburg for the integration.

20


a. yesb. okayc. no

(1) gives examples for commands, (1a) is a request addressed to the robot, whereas (1b) is arequest that the robot might address to the human. (2) gives possible questions that can occurin dialogue. (2a) and (2b) are questions addressed to the robot, (2c) and (2d) are questions thatthe robot might ask, (2c) in order to obtain help, (2d) to clarify intentions of the human. (3) areassertions that the robot should understand and possibly interpret as indirect requests. When therobot produces questions, it should be able to understand the answers, (4) are possible responsesto polar questions.

Linguistic analysis is two-fold, in the first step we apply syntactic and semantic rules ofthe specified grammar and thereby build up a logical formula that represents the meaning ofthe utterance, in a second step we classify this meaning. We subsume utterances with similarmeanings into one ontological type, and allow the robot to handle them in a uniform way. Inthe following section we will first give a short introduction to the formalism of CombinatoryCategorial Grammar. We then describe how we represent the linguistic meaning of an utteranceas a logical formula by giving a set of examples. In the last section we will show how we classifythese logical forms based on an ontology of semantic types.

4.2.1 Combinatory Categorial Grammar

Combinatory Categorial Grammar (CCG) is based on Categorial Grammar, a formalism in whichgrammatical expressions are assigned syntactic types (categories) that specify how the expres-sion combines with other expressions to create larger expressions.6 These categories identifyexpressions either as a function from arguments of one type to results of another, or as an argu-ment. The categories are closely related to the semantic type of the linguistic expression itselfand the semantic representation is built compositionally in parallel to the categorial inference.Being a form of a lexicalized grammar, the application of syntactic rules is entirely dependent onthe category of their inputs. Rules are not structure- or derivation-dependent. Analysis is stringbased and adheres to the principle of adjacency which imposes that combining has to correspondto the surface word order.

Categories assign constituents either primitive categories or functions. Primitive categoriesare for instance N, NP, PP, or S, they can be further distinguished by features, such as person,number, case, or inflection. Functions, like verbs or determiners are identified by the type oftheir result (e.g. VP, NP) and that of their arguments, which may themselves be either functionsor primitive categories. Function categories also determine the order of arguments and whetherthey occur to the right or the left of the functor.

Pure categorial grammar (Bar-Hillel, 1953) restricts combination to functional applicationof functions to arguments to the right or left, which yields the expressivity of context-free gram-mars. CCG extends this core with further combinatory rules, like type-raising and functionalcomposition, allowing mildly context-sensitive grammars (Joshi et al., 1991). Because these

6The explications of this section closely follow the tutorial paper on CCG, (Steedman and Baldridge, 2003).

21

4. Implementation

rules are strictly type-driven and based on some of the combinators identified by (Curry andFeys, 1958), they are called combinatory rules, giving CCG its name.

To illustrate the categories and combinatory rules we give an analysis of (2a) – “Can youbring me a tea?”. This sentence consists of six lexical items (words), with five different cate-gories. The personal pronouns “you” and “me” and the noun “tea” have a primitive category,while the verbs “can” and “bring” and the determiner “a” have a functional category. This isspecified in the lexicon as follows:

(5) a. me 7→ pper

b. you 7→ pper

c. tea 7→ n

d. a 7→ np/n

e. bring 7→ s\pper/np/pper

f. can 7→ s/s

For reasons of simplicity we omit the further specification of categories by features like person,number, or case, which would distinguish “me” from “you”, for instance.

The functional categories (5d), (5e), and (5f) use the forward and backward applicationrules:78

(6) Forward application (>): X/Y Y ⇒ X

(7) Backward application (>): Y X\Y ⇒ X

The category of the determiner “a” (5d), specifies that it takes a constituent of type n (noun) toits right and results in a primitive category of type np (noun phrase). See (8) for a derivation:

(8) a tea

np/n n>np

The category of the verb “bring” – (5e) indicates that it takes three arguments and results in aprimitive category of type s (sentence). After binding two arguments at its right, the last argu-ment comes from the left. Finally, the category of the verb “can” – (5f) specifies that it takes acategory s to its right and results in an s. See (9) for the derivation of the sentence (2a) – “Canyou bring me a tea?”

7As the utterances employed in this thesis do not involve any examples for composition or type-raising rules, weomit them here.

8For reasons of space, we also leave out an introduction to the modality of rules, which is actually part of thegrammar fragment, (Baldridge and Kruijff, 2003)

22


(9) can you bring me a tea

s/s pper s\pper/np/pper pper np/n n>

s\pper/np>np>

s\pper<s>s

In a first step of the derivation the complex category of “bring” combines with the simple cat-egory pper (personal pronoun) of “me”. The resulting category s\pper/np then binds the npthat resulted from the combination of “a” and “tea”. The next derivation step uses the backwardapplication rule and binds the pper “you‘”, resulting a the simple category s. This is bound inthe last step of the derivation, by the complex category of “can”, resulting in the category s.

We implemented the grammar using OpenCCG, an open source natural language process-ing library written in Java, which provides parsing and realization services based on CCG. Thespecification of the grammar uses the XML and XSL format. Traditionally, the lexicon for acategorial grammar specifies for each word its own category. In OpenCCG categories are or-ganized into lexical families, which are related to whole sets of words (Baldridge, 2002). Wedefine lexical entries specifying their syntactic category (including features like person, num-ber, and the like) and their semantics using XML/XSL. CCG and OpenCCG have a completelytransparent interface between surface syntax and underlying semantic representation, syntacticrules are closely connected to semantic rules. Before reading a more extensive introduction tothe representation of the linguistic meaning of utterances in the next section §4.2.2, the readershould become aware of the primacy of semantics over syntax. The structure of a category isdetermined by the meaning it expresses. Thus, we consider linguistic meaning as the only levelof representation, and syntactic structure only as an artifact. Syntactic structure is built throughinference over categories, but these categories only reflect the underlying meaning, while theinference reflects the composition of meaning.

4.2.2 Linguistic Meaning as Relational Structure

Linguistic meaning is expressed as a relational structure. This structure connects pieces of mean-ing via directed labeled edges. The labels on the edges indicate the meaning of the dependentwith regard to the meaning of the head that governs the dependent. See for instance Figure 4.3,the relational structure for the meaning of (1a), “Bring me a tea!”

The verbal predicate bring is the root of the structure, it has three dependents: The RECIP-IENT of the action – me, the PATIENT of the action – tea, and the ACTOR – the HEARER of theutterance. The actor is usually not realized in the surface form of imperatives.

Formal Representation

The kind of meaning representation we adopt here is based on case grammar, (Fillmore, 1968),work in theta-roles, e.g. (Dowty, 1989), and various theories of valency in dependency grammar,

23

4. Implementation

Figure 4.3: Dependency graph for “Bring me a tea!”

(Peirce, 1992; Sgall et al., 1986). Resembling conceptual structures of AI knowledge representa-tion, it yields relatively flat structures, unlike the type theory-based representations of MontagueGrammar. For a more complete discussion, see e.g. (Kruijff, 2001; Baldridge and Kruijff, 2002)or (Davis, 1996).

Formally, we express meaning using hybrid logic. The term hybrid logic refers to a numberof extensions to modal logic with more expressive power (Blackburn, 2000; Areces, 2000).Modal logic systems are often interpreted based on Kripke-style relational semantics, whichconsists of states (or worlds) and accessibility relations between these states. Therefore it lendsitself ideally to capture relational structures.

However, modal logic has a severe drawback – it provides no way to refer to or reason aboutthe states directly. This is a problem, for instance, if we want to model time structure. Thoughwe can use Prior’s Past and Future operators to state that something happened at some point inthe past, or will happen at some point in the future, pure modal logic offers no way to explicitlyrefer to that point (Blackburn, 1990, 1994).

Hybrid logic solves this problem by introducing a class of formulas called nominals. Nomi-nals are unique references to a state in the underlying model, which means that they must be trueat exactly one state in any model. A nominal names a state by being true there and nowhere else.Being atomic formulas, they have the same status as other formulas, e.g. propositions in simplemodal logic, and can be combined with operators in order to build up more complex formulas.Along with nominals, hybrid logic introduces a set of specific operators ranging over them. Themost important for our purposes is the “@” operator that allows us to specify what is going on atthe states named by nominals: @nφ means that “at the state referred to by n, formula φ holds”.Employing “@”, we can jump to the state named by a nominal, and see if some formula is truethere. Another important feature of hybrid logic is its approach to sorting – nominals can havedifferent sorts, which can be constituents of an ontology. This allows us to represent meaning asan ontologically richly sorted structure.

For constructing a relational structure, we build a conjunction of elementary predications(Kruijff, 2001). We have three types of predications: lexical predications, features, and depen-dency relations. The most basic one, the lexical predication, is an identifier nj , which may besorted, with a proposition p that holds for that identifier: @n j (prop), “at nj , prop holds”. Nom-inals can be sorted to indicate the ontological sort or category of the proposition that holds atthe state referred to by the nominal. For example, @{t1 :thing}tea represents the fact that tea is a

24


thing.9 We specify a feature f with value v for nj as @n j (〈f〉v). We model relations by usingthe standard modal operators : @n i〈R〉nj means that there is a relation R between the nominalsni and nj .10

Examples

Logical Form 4.1 gives an example for the formal representation of the relational structure de-picted in Figure 4.3, “Bring me a tea!”: Here, b1 is the nominal (or discourse referent) for the

Logical Form 4.1 Bring me a tea!

@b1:action(bring ˆ <Mood>imp ˆ<Actor>(r1:hearer ˆ you) ˆ<Patient>(t1:thing ˆ tea) ˆ<Recipient>(i1:person ˆ I))

event bring, with the type action. The value imp of the feature MOOD indicates that the bring-action is in imperative mood. As indicated in Figure 4.3, the root has three relations to othernominals. The ACTOR of the bring-predicate is the hearer – in this case the robot to whom thecommand is addressed. The PATIENT is tea, and the RECIPIENT is I.

The linguistic meaning of Logical Form 1b – “Help me!”– is given in Logical Form 4.2: This

Logical Form 4.2 Help me!

@h1:action(help ˆ <Mood>imp ˆ<Actor>(y1:hearer ˆ you) ˆ<Recipient>(i1:person ˆ I))

is similar to Logical Form 4.1, except that the help-action has no PATIENT. Figure 4.4 shows partof the relational structure of (3a) – “I like to have tea.” The verbal predicate like is the head of

Figure 4.4: Dependency graph for “I like to have tea”

9Note that this allows us to use nominals as (neo-Davidsonian style) discourse referents.10The explications of this section are based on (Kruijff, 2005).

25

4. Implementation

a SENSER-relation to the subject of the sentence I, and a PHENOMENON-relation to the verbalpredicate have. The nominal have itself is the head of two relations: tea is the POSSESSION andthe subject I is the OWNER. Note that the dependent of the SENSER-relation is token-identical tothe dependent of the OWNER-relation.

Logical Form 4.3 gives the logical form for (3b) – “I would like to have a tea.”. The moodof the sentence is indicative, the root is the modal would, which embeds the like-nominal withtype emotive-mental-process under a relation SCOPE. The robot can interpret this assertion as

Logical Form 4.3 I would like to have a tea.

@w1:state(would ˆ<Mood>ind<Scope>(l1:emotive-mental-process ˆ like ˆ<Phenomenon>(h1:state ˆ have ˆ<Owner>i1:person ˆ<Possession>(t1:thing ˆ tea) ˆ

<Senser>(i1:person ˆ I)))

an indirect request under certain conditions, as will be described in more detail in §6.1. If it isnot sure about the interpretation it will ask for clarification. For that purpose it can produce anutterance like (2d) – “Do you want me to help you?”. The logical form for that question is givenin Logical Form 4.4: The root is the auxiliary do, the mood of the sentence is interrogative.

Logical Form 4.4 Do you want me to help you?

@d1:state(do ˆ<Mood>int ˆ<Scope>(w1:desiderate-mental-process ˆ want ˆ<Patient>(i1:person ˆ I) ˆ<Phenomenon>(h1:action ˆ help ˆ<Actor>i1:person ˆ<Recipient>(y1:person ˆ you)) ˆ

<Senser>(y2:person ˆ you)))

The scope-dependent is the verbal predicate want of type desiderate-mental-process. want hasthree dependents: the SENSER you, the PATIENT I, and the PHENOMENON – help-action. Thisnominal itself has two dependents: the RECIPIENT you and the ACTOR, which is identical tothe PATIENT-dependent of want. Note that this is an instance of the object-control phenomenonwhere the referent of the object argument of the control verb (want) is identical to the referentof the subject of the embedded verb predicate.

An example for a logical formula of responses to polar questions in (4) is given in LogicalForm 4.5 for (4a) – “yes”. The formula classifies the answer as a cue with positive polarity. (4b)– “okay” also has positive polarity, (4c) has negative polarity.

26


Logical Form 4.5 Yes

@y1:cue(yes ˆ <Polarity>+)

4.2.3 Ontological Types

This section introduces the ontological classification of utterances addressed to the robot. Thisclassification is an important step in the further processing of utterances, as it provides a nec-essary abstraction of detailed logical formulas. Based on the assigned type of an utterance, therobot decides how to react – utterances of the same type require similar steps of further pro-cessing. Figure 4.5 shows those pieces of the type ontology that are relevant for the classifica-

Figure 4.5: Ontological types for classifying utterances

tion of the utterances below. On the top level we distinguish between assertions, questions, andcommands based on the sentence mood: the mood of commands is imperative, questions haveinterrogative and assertions indicative mood. For utterances that contain a modal verb we assigna modality, the paragraph on questions will give more detail. We further assign a content typewhich is loosely related to SUMO (Niles and Pease, 2001).

Assertions and Commands

The assertions that we consider here are potential requests regarding an object. Table 4.1 showsthe content types we assign to the different assertions. The type desire.object refers to an as-sertion about a desire or a wish regarding an object or the posession of that object. The contenttype need.object refers to an assertion about a need regarding an object or the posession of thatobject. The content type state.physical indicates an assertion about a physical state of an agent.

The ontological type of answers to polar questions is polar.negative for “no” and polar.positivefor “yes” and “okay”.

For the commands we consider here, there is only one content type – transfer.object. This is

27

4. Implementation

utterance typeI want coffee. desire.objectI would like to have a tea. desire.objectI need tea need.objectI am thirsty state.physical

Table 4.1: Mapping assertions to ontological types

assigned to utterances like (1a) “Bring me a tea!” or “Please get me a coffee!”.11

Questions

For questions we distinguish between factual and polar questions. Factual questions are afterfacts; polar questions are after the (possibly boolean) polarity of the answer.

For questions that contain modal auxiliaries, we can specify the verbal modality. Currentlywe consider the following modalities: permission, possibility, ability, volition, and prediction.The modal auxiliaries can and could indicate permission, possibility, or ability; whereas willand would express volition or prediction, (Biber et al., 1999). The questions that we considerhere regard the ability to transfer an object. Given the above types, we can assign the utterances(2a) and (2b) and similar questions to those complex types:

utterance typeCan you get me a tea? polar.perm-poss-abil.transfer.objectCould you bring me a tea (please)? polar.perm-poss-abil.transfer.objectWill you bring me a tea? polar.volit-predict.transfer.objectWould you get me a tea (please)? polar.volit-predict.transfer.object

Table 4.2: Mapping questions to ontological types

4.3 Raising Questions

This section describes how the architecture enables the robot to initiate a subdialogue on its owninitiative, a functionality that we build upon for the processes described in Chapter 6.

The need to raise questions can arise from uncertainty, inconsistencies, or ambiguity of per-ceptions and knowledge in different modalities. (Kruijff et al., 2006) describe how the need topose clarification questions arises in the context of a human-augmented mapping scenario. Inthe context of this thesis, the robot has to initiate a subdialogue in order to clarify an ambiguousutterance (§6.1), or to engage a human to help it (§6.2). The following characterization is a shortform of the presentation of the mechanisms to raise and solve those questions given in (Kruijff

11The ontology for commands comprises more types, but the according utterances are not in the scope of thisthesis.

28

4.4 Acting in the Physical Environment

et al., 2006) . The approach to dealing with questions and their function in grounding is inspiredby (Larsson, 2002), where a datastructure QUD (Questions Under Discussion) is introduced formanaging open questions or unresolved issues. The need to raise a question can originate in dif-ferent processes/modules, in our case it is either the action planner (for help requests) or the BDImediator itself (for clarifying ambiguous utterances). In either case, the BDI mediator stores thequestion with its identifier and triggers the communication subsystem to resolve the question,through a dialogue with the human. The communication subsystem plans a communicative goaland the content to express the question. It then generates a string expressing this content, utters it,and adds the planned content to a model of the dialogue context, to log that the robot has askeda question. After the robot has addressed the question to the human, we expect the human toanswer the question. Any incoming utterance will be interpreted in the current dialogue context,which means that we try to establish a rhetorical relation between the answer and a previouslyraised question. The result of this analysis is a relational structure that connects the content ofthe answer to that of the presumed matching question. Now this analysis is passed back to theBDI mediator, which notifies the process that raised the question if necessary.

4.4 Acting in the Physical Environment

This section describes the mechanisms for acting in the physical environment. It describes theimplementation of the actions that enable the robot to interact with inanimate objects in theenvironment. Possible interactions are searching for potential toys, playing with them, or turningaway from them.

4.4.1 Actions

For this purpose we implemented a set of simple actions that can be combined to build up theoverall behavior of the robot. Examples of those actions are to approach an object (approach),to push an object (push), to turn for a certain angle (turn), or to move into a certain directionfor a certain distance (go). Actions are defined by their preconditions and postconditions, andthe actuator commands that are executed to achieve the postconditions. Pre- and postconditionsare formulated in terms of interpreted sensor readings.

Figure 4.6 shows a schematic representation of the process flow for an action. The first stepin action execution is to check if the preconditions hold. If they don’t, the action fails. If theyhold, the action-specific actuator commands are executed. While executing those commands,sensor readings might indicate a break condition (e.g. a triggered bumper sensor), which entailsthe failure of the action. After execution of the actuator command, sensor readings are used tocheck the postconditions. If they hold, the action terminates successfully. If not, the actuatorcommands are repeated until the postconditions are achieved or the action is interrupted by abreak condition.

29

4. Implementation

Figure 4.6: Process flow for actions

4.4.2 Parameters and Conditions

Actions differ in whether or not they relate to an object. The actions approach and pushdepend on an object, whereas turn and go are object-independent. Each action can take a setof parameters. For object-dependent actions one of the parameters is the type of the object, otherpossible parameters are the distance to travel for go, the angle to turn for turn or the forceand distance to push an object for push. Conditions for an action are defined in terms of sensorreadings and their interpretation.

Approaching an object The action approach depends on the perception of the object to beapproached. If no object can be recognized, the action fails. The postcondition for approachis that the approached object is recognized within a certain distance and angle (to be given asparameters to that action) in front of the robot. Those perceptions arise from the interpretationof the laser range finder. From the laser readings, size and shape of objects can be derived. Forthe purpose of this thesis we only consider circle shaped objects with an extent between 15 and35 centimeters as potential toys. Based on the extent we further distinguish two types of toys:Prototypical buckets, that have a size around 30 centimeters and prototypical playmobiles witha diameter of around 10 centimeters. The approach action is interrupted and thus fails if oneof the robot’s front bumpers is triggered. When approaching an object this should not happenas the robot would not trespass a certain distance to the object. However, sometimes the laserreadings might be unreliable or some obstacles might be invisible to the laser sensor. In this case

30

4.5 BDI-Model

the robot should not move any further but instantly abort the action. Another cause for failure ofapproach is the sudden disappearance of the object to be approached.

Pushing an object Similar to approach, the action push fails if one of the front bumpersis triggered. Different to approach, this is not unexpected to happen, as pushing an objectrequires direct contact between front bumpers and the object. However the robot should not tryto push unmovable objects or objects that are too heavy to push them without bumpers triggered.The precondition for push is that the target object is recognized in immediate distance in frontof the robot. The action succeeds if the object could be pushed for the distance given as parameterto the action. An additional parameter for that action is the velocity with which the robot shoulddrive against the object.

Moving and Turning The actions turn and go do not relate to an object. Parameters arethe angle in degrees and the distance in mm, respectively. The precondition for turn is thatthere is no obstacle perceivable immediately in front of the robot. This is to ensure that the robotdoes not bump against anything when turning. Postconditions are defined based on the odometryreadings from the motor. The action terminates successfully when the target angle (for turn)or the target x-y-coordinates (for go) are reached. go fails if an obstacle is perceived in the wayto the target destination. Both actions fail if one of the bumper sensors is triggered.

Actions embedded Actions are usually embedded in a higher-level process, for instance theaction planner. This process receives the execution state of an action, that is whether the actionsucceeded or failed because of unsatisfied pre- or postconditions. This feedback is used for theacquisition of dispositions. Chapter 5 will illustrate how the success state of an action influencesthe tendency of the robot to repeat this action.

4.5 BDI-Model

This section presents the model that we use to calculate goals depending on desires, dispositions,and the perceived state of the world. The model integrates the different scenarios for different dis-positions. It is oriented on BDI-agent architectures, (Bratman, 1987; Rao and Georgeff, 1991),in which agents decide what to do next based on their Beliefs, Desires, and Intentions.

The behavior of the robot is controlled by the priority of its desires. Depending on its desiresand its perceptions of the physical environment and perceived utterances of other agents, itadopts a goal and tries to achieve this goal. We establish two desires, the desire to play and thedesire to serve. The play desire motivates the self-propelled playful interaction and explorationof the environment. The serve desire motivates actions that serve and assist the human interactionpartner. As a consequence of assigning different priorities to the different desires, the robot willadopt different goals.

There are three possibilities to distinguish: either one of the desires is more important thanthe other, or they are both equally important. Figure 4.7 illustrates the possible alternatives. Thestate of the seesaw indicates the relative priority of the desires. If the desire to play is more

31

4. Implementation

Figure 4.7: Desires and dispositions leading to adoption of different goals. The relative priorityof the desires Serve and Play is depicted in the seesaws. If desires are balanced, goals dependon dispositions for perceived objects.

important, the robot will adopt a plan to play. If the desire to serve is more important, the robotis trying to serve. In case both desires are equally important, the derived plan depends on thedispositions regarding the objects that the robot perceives.

Playing is more important If the playing desire has the highest priority, the robot will adopta play plan. This means that it is going to look for potential toy objects and tries to manipulatethem. In the course of this playful interaction with the environment, it will gain experience thatdetermines its dispositions according to the processes described in Chapter 5.

Serving is more important If the serving desire has higher priority than the play desire, therobot tries to fulfill requests of the human. If it perceives an utterances, it tries to map it to a goalstate and plan the required actions. §6.1 provides examples of those utterances and the entailedactions. If the human’s utterance does not give rise to any possible goal states, or if there is noutterance at all, the robot offers its services actively by uttering “How can I help you?”.

Playing and serving are equally important If the desire to play and the desire to serve areequally important, the goal to adopt depends on the robot’s dispositions. The robot checks if it

32

4.6 Conclusion

can perceive any potential toy objects in its environment. If there are no such objects or if allof the perceived objects are associated with negative experience, the robot will follow its servedesire. This means that it either acts according to any perceived utterances by the human or thatit offers its help explicitly. However, if it perceives any potential toys that it has no experience orno negative experience with, it will follow its play desire and try to manipulate the objects. By‘positive’ or ‘negative’ experience with an object, we refer to the dispositions for actions towardthis type of object. This is described in more detail in Chapter 5.

The model shows how the robot adopts a goal depending on its desires, dispositions, andperceptions of the environment. The perceptions of objects and the utterances of other agentsconstitute the belief-part of the model, playing and serving are desires and the derived goals topursue are the intentions.

4.6 Conclusion

This chapter set the basis for the following two chapters by introducing the robot architecturein which the work of this thesis is embedded and presenting the framework of the linguisticanalysis that we employ for the communicational interaction between robot and human. Thechapter further presented the mechanism to initiate subdialogues to resolve open questions, andthe implementation of basic actions for manipulating objects in the environment. It ended withthe presentation of a simple BDI-model that integrates the effect of dispositions for the differentscenarios.

33

Chapter 5

Dispositions in Physical Context

SummaryThis chapter describes how the robot acquires dispositions towards interactions with inani-

mate physical objects in its environment, as opposed to dispositions for communicative ac-tions described in Chapter 6.

5.1 Introduction

This scenario shows how the robot can acquire dispositions towards situations that involve in-teractions with inanimate physical objects which the robot encounters in the environment. Forthe purpose of this scenario, we assume that the robot has a natural motivation for trying to ma-nipulate objects in its environment. The robot engages in playful interaction in its environmentby looking for objects and trying to manipulate them. Its initial disposition is to look for kindsof objects that it can manipulate by driving against them. If the robot perceives such an objectit will approach it and try to push it. The change of the initial disposition is controlled by twofactors: (1) the success or failure of the attempted action and (2) verbal evaluations of a human.The action can succeed or fail and another agent can praise or blame the robot for pushing theobject. Success of the action or praise by another agent will promote the behavior to go for theseobjects and push them again. Failure of the action or blame will inhibit this behavior. So, if therobot was blamed for pushing an object, or if it did not succeed in pushing, because the objectslipped away or was too heavy to be moved, the robot will less likely go again for that object.

For this scenario the robot needs the ability to move through an environment; to avoid,carefully approach, and to (attempt to) manipulate objects in the environment. Further it needsthe ability to perceive objects in terms of (a) their distance relative to the robot and (b) their typederived from basic physical properties like e.g. shapes.

5.2 Deciding and Adapting

This section describes the mechanisms that determine the behavior of the robot as sketchedabove. It characterizes the general strategy of the robot with respect to different states of the

34


Figure 5.1: Decision tree illustrating the strategy for playful interaction in the environment. Theleaves correspond to states of the environment and indicate the set of possible actions. If noobject is perceived, the robot searches for an object by turning or going forward (d). If an objectis perceived, the robot checks if it is a potential toy, if not it departs from that object and looksfor another one by turning (c). If it perceives a potential toy, the robot checks the distance, if itis close enough, it pushes it (a), else it approaches it (b). The dotted rectangle indicates the theaction choices that can be adapted by experience.

environment, and then defines how the agent’s actions and the environment’s reactions determinethe future behavioral tendencies of the robot.

5.2.1 General Strategy for Playful Interaction

Figure 5.1 illustrates the decision process on how to act based on the perceived state of theenvironment. The leaves of this decision tree indicate the set of possible actions in that state.There are two types of leaves, those where the choice of an action is fixed and not subject to anyadaptational processes (c,d), and those where the choice is influenced by past experience (a,b).In the latter the actions are chosen depending on dispositions.

At first the robot checks if it can perceive an object at all. If not, it will search for an object,following the actions indicated in leaf (d): It looks for an object by turning 40 degrees. If thereis still no object perceivable it will repeat the turn for 8 times and then go forward for a meter.

35

5. Dispositions in Physical Context

If there is a perceivable object, the robot checks the type of that object. It distinguishesbetween potential toys, i.e. objects that could potentially be pushed and objects that could not.Classification is done on the basis of size and shape of the object, as perceived by the lasersensor. We determine potential toys to have a circle shape and a circumference between 15cmand 35cm and. Based on the size we distinguish two types of toys. All other objects, i.e. objectsthat have an extent of less than 15cm or more than 35cm and those that have no circle shapeare not considered as potential play objects. If the perceived object could not be classified asa potential toy, the robot follows (c): It departs from the uninteresting object and searches forpotential toy objects by turning 60 degrees. If it still perceives an uninteresting object, it willrepeat its turn 5 times and then give up, because it seems to be surrounded only by non-toyswhich obstruct its way out to look for other objects.

If the robot can perceive a potential toy object, it needs to check if it is close enough to pushright away (a), or if it is too far for a direct push (b). The initial disposition for distant objects isto approach them and for close objects to push them. This disposition can change depending onthe feedback received from the environment. Feedback is an interpretation of the environment’sstate perceived after execution of an action.

5.2.2 Feedback

Relevant feedback in this scenario arises from the success state of the last action and from verbalevaluations uttered by another agent.

Success State Actions can succeed or fail. For instance, pushing of an object fails, if itis too heavy to be moved (indicated by triggered front bumpers) or if it slips away, i.e. therobot looses it during the pushing process. Failure arises because the preconditions or post-conditions of the action do not hold. Failure of an action decreases the disposition to executethat action again, whereas success increases it. The function f ss maps from a success state,s ∈ {succeeded, failed} to a numerical value v ∈ {−1, 1}:

f ss(s) =

{1 : s = succeeded−1 : s = failed

(5.1)

We use Equation 5.1 to calculate the success state component of the feedback fb(ss), givena perceived and an interpreted state of the environment.

Verbal Evaluation The actions of the robot can be evaluated by human agents. Positiveevaluations increase the disposition for that action, negative evaluations decrease it. Examplesfor positive comments are: yes, good, well done. Negative evaluations are no, don’t, stop it. Thelinguistic analysis of these utterances yields a syntactical and semantic representation and anontological type specification. Table 5.1 shows the ontological types of the example utterances.

The function fve maps from the ontological type of a perceived utterance, t ∈ T to a nu-merical value v ∈ {−1, 0, 1}:

fve(u) =

1 : t ∈ T p

−1 : t ∈ T n

0 : t 6∈ T p ∪ T n

(5.2)

36


utterance typeyes polar.positivegood evaluative.positivewell done evaluative.positiveno polar.negativedon’t evaluative.negativestop it evaluative.negative

Table 5.1: Mapping from evaluative utterances to ontological types

with T p = {polar.positive, evaluative.positive}and T n = {polar.negative, evaluative.negative}

We use Equation 5.2 to calculate the verbal evaluation component of the feedback, fb(ve),given a perceived and interpreted state.

By perceiving and interpreting the state of the environment after execution of an action, theagent extracts feedback and builds up experience that determines its behavioral tendencies infuture states. The integration of both kinds of feedback will be shown in Equation 5.5 in §5.2.4.

5.2.3 Experience

As introduced in Chapter 3, the experience of an agent is built up by the received feedback fromthe environment that follows the execution of an action a in a state s. In this scenario states arecharacterized by the type of objects that are perceived. So for each action a aimed at a type ofobject ot the agent maintains experience exp(ot, a). Experience exp(ot, a) of an agent containsfeedback values from the environment stored separately for success states, exp(ot, a, ss) andverbal evaluations, exp(ot, a, ve). In the beginning when the agent has not yet executed actiona aimed at an object of type ot, no experience is available, we say exp0 (ot, a). Every timethe agent executes action a aiming at an object of type ot, the following feedback fb(ot, a) isused to update the experience exp(ot, a). As described in §3.2.1 there are several possible waysto update experience. As we expect the environment to possibly change, we take into accountprevious feedbacks weighted according to their recency, expressed by Equation 3.4, repeatedhere for convenience:

expk+1 = expk + α ∗ [fbk+1 − expk ] (5.3)

(with exp as short for exp(ot, a) and fb short for fb(ot, a)) We use α = 0.5 as the step sizeparameter to weigh recent against past feedback. This seems to be a good mean to balance theimportance of the current feedback against the importance of past experience. α approaching1.0 would emphasize the current feedback, whereas α approaching 0.0 would give past feedbackmore weight relative to the last feedback.

Equation 5.3 also holds for k = 0, yielding exp1 = fb1 for arbitrary exp0 and thus isdefined for cases when no previous experience was obtained. Given that each feedback value (ssor ve) fb: −1 ≥ fb ≤ 1, it holds for every experience value exp: −1 ≥ exp ≤ 1. To see howthe different feedback values (ss, ve) are integrated, see Equation 5.5 in §5.2.4.

37


5.2.4 Dispositions

The experience is the key for deciding what to do next, given a state with a manipulable object.We will now describe the processes in leaves (a) and (b) of the decision tree in Figure 5.1. Ineach state (=leaf) the robot can choose between two actions: either to go for the perceived object(approach or push it) or to turn away from it and look for other objects. The robot should bemore likely to show interest in the object (approaching or pushing), when it has no negativeexperience with it, i.e. if it was not blamed for or frustrated by trying to act on it.

So the disposition function δ should be defined such that good experience increases the prob-ability to choose the push or approach action, whereas bad experience increases the probabilityto turn away. The experience from different kinds of feedback can be weighted in order to giveone feedback source more importance than other. The weighting of different kinds of feedbackfollows the general principle expressed by (5.4):

weigh(exp) =k1 ∗ exp(fb1 ) + k2 ∗ exp(fb2 ) + · · ·+ kn ∗ exp(fbn)

k1 + k2 + · · ·+ kn(5.4)

with ki ∈ N and ki ≥ 0For the two kinds of feedback considered here, verbal evaluation ve and success state ss, we

adopt Equation 5.5.

weigh(exp) =k1 ∗ exp(ve) + k2 ∗ exp(ss)

k1 + k2(5.5)

We set k1 = 4, k2 = 1, in order to prioritize verbal evaluation over success state of actions.

Dispositions in general Before we define the disposition for the specific case that only regardstwo actions, we introduce a general definition for an arbitrary amount of possible actions tochoose from. This requires two steps: First we have to transform the experience values lyingin the interval [−1, 1] to the probability interval [0, 1]. We achieve this by dividing the possiblyweighted experience value by 2 and adding a half: exp′ = 0.5∗weigh(exp)+0.5. In the secondstep we have to determine the probability of an action according to its experience value. This isexpressed by Equation 5.6:

δs(exp′, a) =exp′(s, a)∑

ai∈A(s)exp′(s, ai)(5.6)

The better the experience with an action, the higher its probability to be executed. Note that theexecution probability of all actions sums up to 1.0. The initial disposition, with no experiencegiven for an action a, depends on the definition of exp′0 .

As we will see in our example, it might be the case that we cannot receive or unequivocallyassign a feedback for all of the actions. In this case we have to define the disposition for thosenon-feedback actions, A, depending on the dispositions for the feedback actions. If the averageexperience for the feedback actions is positive, the probability to execute one of them shouldbe above 0.5, leaving a probability of less than 0.5 for the non-feedback actions. If the averageexperience for feedback actions is negative, this should yield an execution probability of less then

38


0.5 and more than 0.5 for non-feedback actions. This is mathematically expressed as follows.We define the average experience avex for all feedback actions, a ∈ A(s), in Equation 5.7:

avex(A(s)) =∑

ai∈A(s)exp′(s, ai)|A(s)|

(5.7)

Next, we redefine Equation 5.6, by multiplying it with the avex. This decreases the summedprobability of all feedback actions, a ∈ A, in order to leave some probability for the non-feedback actions.

δs(exp′, a) =exp′(s, a)∑

ai∈A(s)exp′(s, ai)∗ avex(A(s)) (5.8)

Equation 5.9 then defines the disposition of a non-feedback action a ∈ A:

δs(a) =1− avex(A(s))

|A(s)|(5.9)

This ensures that the sum of all actions, a ∈ A(s) ∪ A(s) sums up to 1.0.

Dispositions in a specific case Now having defined the dispositions for the general case, welook at the specific case for the dispositions to push an object or turn away from it (cf. leaf (a)in Figure 5.1). We have two possible actions here, but this case is special as the feedback isonly taken into account for those actions that address an object. The complement action, i.e.turning away is neither checked for success nor for verbal feedback.1 Therefore we define thedisposition for turning away depending solely on the disposition for the complement action.

Using the weighting from Equation 5.5, Equation 5.10 defines the probability of a pushaction (cf. leaf (a) in Figure 5.1).

δ(exp, push) =

1 : : exp0

0.5 ∗ weigh(exp) + 0.5 : weigh(exp) ≥ 00.4 ∗ weigh(exp) + 0.5 : weigh(exp) < 0

(5.10)

Note that exp actually denotes exp(oti , push) and exp0 indicates that no experience is avail-able. Given that experience exp and weigh(exp) ∈ [−1, 1] Equation 5.10 yields a probabilityto push the object related to the quality of the experience. If the experience is neutral (exp = 0)the probability of pushing is equal to the probability of not pushing = 0.5. Positive experience(exp > 0) yields a probability to push between in the interval [0.5, 1.0]. Negative experience(exp < 0) yields a probability in the interval [0.1, 0.5[. Given that the probabilities of all actionsneed to add up to 1, the probability of the second action, i.e. turning away instead of pushingis given by Equation 5.11. Having only two actions, we do not need to apply the averagingequations 5.7–5.9.

δ(exp, turn) = 1− δ(exp, push) (5.11)

See Figure 5.2 for an illustration of Equation 5.10 and 5.11. Equation 5.10 can be applied to leaf1The problem is that turn away actions cannot be so easily recognized as referring to an object from the perspective

of an evaluating human. But one could extend the scenario such that the human actively encourages the robot to dosome kind of action with some kind of object.

39


0

0.2

0.4

0.6

0.8

1

-1 -0.5 0 0.5 1

p(a)

exp

push(x)turn(x)

Figure 5.2: Graphical representation of disposition δ. The probabilities p(a) of a = push anda = turn resp. are indicated on the y-axis, they depend on weigh(exp), indicated on the x-axis

(b) in Figure 5.1, when approach is substituted with push. Equation 5.10 expresses that, whenone was praised for pushing an object or succeeded in it, one would like to push it again. Themore praise (relative to blame) and the more success (relative to failure), the more likely you areto push again. The more blame (relative to praise) and the more failure (relative to success), theless likely you are going to push it. However, verbal feedback is considered more important thanthe success state.

If the experience is balanced between positive and negative feedbacks (either because blame/praise and failure/success are balanced or verbal evaluation is counterbalanced with successstate), the probability to push the object equals the probability to not push it. In order to allowthe robot to handle a non-static environment, the probability for actions that address objects willnever be smaller than 0.1 (according to Equation 5.10), even when only negative feedback wascollected. Otherwise, the robot would never have the possibility to acquire positive feedbackagain, after a sequence of negative feedbacks.

By tuning Equation 5.10 one can control the robot’s behavior either to be more intimidatedby negative experience, or to be more adventurous or resistant and repeat actions despite negativeexperience. The robot always has to trade off between the risk to gain negative experience andthe risk to possibly miss positive experience.2

Note that the robot does not associate its experience for pushing a certain type of object

2It could possibly also learn how probable a change in the environment is, i.e. how often humans change theirmind in what they praise and blame, and with this knowledge it could estimate if it is worthwile to adapt its behaviorto the verbal evaluation at all. But this is beyond the scope of this thesis and it might be hard to learn, given thelimited time for interaction.

40

5.3 Examples

with its experience of approaching it. One could argue that these two actions, aiming at thesame object, relate to each other and thus, that the dispositions should relate somehow as well.However, it is beyond the scope of this thesis to set up relations between actions and definedispositions according to similarity between actions.

5.3 Examples

This section illustrates the foregoing specification by giving examples of the robot’s interactionand experience. The section begins with four examples of a simple sequence of choosing anaction, perceiving the consequences, and again choosing an action. The examples show how thereceived feedback changes the probabilities to execute one of all possible actions in a situation.In the second part of this section, we will extend the single-event examples to sequences of 15perceived feedbacks, and illustrate how those sequences have an impact on the probabilities ofchoosing an action.

Figure 5.3: Example 1: The robot perceives an object x and pushes it. It is blamed for doing so(b), and as a consequence turns away when it sees an object x the next time (d).

5.3.1 Single Events

Consider Figure 5.33 which illustrates the impact of a blame. In the beginning (a), the robothas no experience, it perceives an object of type x right in front of it and possible actions areeither to push the object or to turn around. As there is no previous experience with that type ofobject, Equation 5.10 yields a probability of 1 to push the object. So the robot tries to push it. Itsucceeds but perceives a negative evaluation of a human agent uttering “Don’t!”. The utteranceis parsed and analyzed as being of type evaluative.negative. This translates to the followingfeedback and experience values: fb(ss) = 1 and fb(ve) = −1, exp(push, x, ss) = 1 andexp(push, x, ve) = −1. The next time that the robot perceives object x in a proper distance topush it, the weighted experience according to Equation 5.5 is weigh(exp) = 4∗−1+1∗1

4+1 = −0.6.

3Here and in the following figures (5.4 – 5.6), the robot is depicted by a grey circle with and a triangular, whosebase indicates the robot’s heading. The white circle labeled with x indicates the object of type x.

41


Applying Equation 5.10 yields 0.4 ∗ weigh(exp(push, x)) + 0.5 = 0.4 ∗ −0.6 + 0.5 = 0.26as the probability for pushing. The probability to turn away is 1 − 0.26 = 0.74. So the robotwill more likely ignore the object than try to push it again. By defining Equation 5.10 such thatit still leaves a small probability to push the object anyhow, we ensure that the robot still couldgain further experience with that object. The human might change his mind or a different humancould give positive feedback for pushing the object and so the disposition could change again.4

Figure 5.4: Example 2: The robot perceives an object x and pushes it. It is praised for doing so(b), and as a consequence pushes an object x again the next time it perceives one (d)

Figure 5.4 illustrates the impact of a positive verbal evaluation. Instead of a blame as inthe forgoing example, the robot perceives the utterance “Good!”, which is analyzed as beingof type evaluative.positive. Again starting from no experience, this translates to the followingfeedback and experience values: fb(ss) = 1 and fb(ve) = 1, exp(push, x, ss) = 1 andexp(push, x, ve) = 1. The weighted experience according to Equation 5.5 is weigh(exp) =4∗1+1∗1

4+1 = 1. The next time that the robot perceives object x in a proper distance to push it,applying Equation 5.10 yields 0.5 ∗ weigh(exp(push, x)) + 0.5 = 0.5 ∗ 1 + 0.5 = 1.0 as theprobability for pushing. The probability of turning away is 1− 1 = 0. So the robot is definitelygoing to push the object again.

Another way to acquire negative dispositions for an action aimed at a certain object is thefailure of that action. The processes illustrated in Figure 5.5 are similar to those in Figure 5.3,but the negative experience with the object arises from the unsuccessful execution of the action:The object cannot be pushed - maybe because it is too heavy. Not given any comments from ahuman agent, we get the following feedback and experience values: fb(ss) = −1 and fb(ve) =0, exp(push, x, ss) = −1 and exp(push, x, ve) = 0. The next time that the robot perceivesobject x in a proper distance to push it, the weighted experience according to Equation 5.5 isweigh(exp) = 4∗0+1∗−1

4+1 = −0.2. Applying Equation 5.10 yields 0.4∗weigh(exp(push, x))+0.5 = 0.4 ∗ −0.2 + 0.5 = 0.42 as the probability for pushing. The probability of turning awayis 1− 0.42 = 0.58. The robot will more likely ignore the object than try to push it again.

Figure 5.6 illustrates the impact of a succeeded action. Again with no verbal evaluationsfrom a human, we get the following feedback and experience values: fb(ss) = 1 and fb(ve) =

4As long as the the robot cannot distinguish between different humans, dispositions for manipulating objectsremain user-independent.

42

5.3 Examples

Figure 5.5: Example 3: The robot perceives an object x and tries to push it. It does not succeed(b), and as a consequence turns away when it sees an object x the next time (d)

Figure 5.6: Example 4: The robot perceives an object x and tries to push it. It succeeds, and as aconsequence pushes an object x again the next time it perceives one (d)

0, exp(push, x, ss) = 1 and exp(push, x, ve) = 0. The next time that the robot perceivesobject x in a proper distance to push it, the weighted experience according to Equation 5.5 isweigh(exp) = 4∗0+1∗1

4+1 = 0.2. Applying Equation 5.10 yields 0.5 ∗ weigh(exp(push, x)) +0.5 = 0.5 ∗ 0.2 + 0.5 = 0.6 as the probability for pushing. The probability to turn away is1− 0.6 = 0.4. The robot will more likely try to push the object than to turn away from it.

5.3.2 Sequences of Events

The preceeding examples covered only one action and perception each, and illustrated how thissingle event had an effect on the probability to choose an action the next time. We close thechapter with four examples that illustrate a sequence of feedbacks and the according dispositions.See Figure 5.7 for four different sequences of feedback.

All curves display the probabilities of executing an action a given experience that is builtup by up to 15 different preceeding feedbacks. The sequence of feedbacks can be separatedinto three sections, the first third comprises a subsequence of 5 negative feedbacks, the secondthird refers a subsequence of 5 positive feedbacks, and the last third contains a sequence of five

43


(a) Feedback from success state, fb(ve) = 0

(b) Feedback from verbal evaluation, fb(ss) = 1

(c) Feedback from verbal evaluation/success state with corre-sponding values

(d) Feedback from verbal evaluation/success state with contraryvalues

Figure 5.7: Plots for sequences of gathered experience, first 5 negatives, then 5 positives, in theend 5 negatives again.

44

5.4 Conclusion

negative feedbacks.The four different curves in Figure 5.7(a) to 5.7(d) give examples for the effect of different

kinds of feedbacks. Figure 5.7(a) indicates the effect of feedback from success state of the action,while verbal evaluation is always neutral. The first sequence of failures (fb(ss) = −1) yieldsa probability of executing the action of around 0.4, slightly below 0.5. As soon as the actionsucceeds again in the second third (fb(ss) = 1), the probability to execute the action increasesand exceeds 0.5, approaching 0.6. In the last third, when the action fails again (fb(ss) = −1),the probability to execute it falls below 0.5 again.

The curve in Figure 5.7(b) shows the effect of verbal evaluation, with constant positive feed-back from success state. This curve displays a similar tendency with probabilities of below 0.5in the first and last third, and above 0.5 in the second third. Compared to the curve in Figure5.7(a), this curve is much steeper, with lowest probabilities of around 0.1 and highest probabil-ities approaching 1.0. This is due to the fact that (1) verbal evaluation has a greater weight (4compared to 1), and (2) that we assume constant positive feedback from success state. Note thatthe feedback from success state cannot be neutral, an action either succeeds or fails.

The two preceeding examples clarified the effect of different values of one kind of feedbackwhile the second kind of feedback was constant. The following two examples illustrate the effectof two kinds of feedback, whose values either correspond or oppose. Figure 5.7(c) gives anexample for feedback from verbal evaluation as well as success state with the same polarity, i.e.the polarity concords. One can see that the curve has the same tendency as in Figure 5.7(a) and5.7(b), but is more extreme. While the probabilities above 0.5 in the second third of the curve areas high as in Figure 5.7(b), they are much smaller in the first and last third of the curve, wherenegative experience is built up by negative success state as well as negative verbal evaluation.

Figure 5.7(d) gives an example for contrary feedbacks. In the first and last third of the curveverbal evaluation is positive while success state feedback is negative, in the second third of thecurve it is reverse. In comparison to Figure 5.7(b) the curve is flatter, with probabilities nearerto 0.5.

5.4 Conclusion

This chapter presented the approach on dispositions for acting in the physical environment. Itgave an exemplary application of the more abstractly defined characterization of dispositionsgiven in Chapter 3. The robot acquires tendencies to manipulate or avoid certain kinds of ob-jects, based on the success of manipulation trials, and based on verbal evaluations of a humanregarding the robot’s actions. The model assumes that success increases the tendency to repeatan action whereas frustration decreases it. Likewise, praise reinforces the action while blameinhibits it.

Learning dispositions that primarily refer to a physical context, may involve – but does notrely on – human-robot interaction. The human can be in the loop, but the robot remains fairlyautonomous in this process of acquiring dispositions. In the scenario described above the humanis part of the acquisition process only by virtue of her/his verbal appraisal. This is unidirectional– the robot does not directly respond to the human’s comments but its behavior in the physicalsituation is determined by them. This has a social aspect in that the robot is sensitive to social

45


evaluation, its behavior is influenced by judgments of other agents.The next chapter applies dispositions in a context that is more explicitly social, the robot

learns preferences for communicational actions in bidirectional interaction with the human.

46

Chapter 6

Dispositions in CommunicationalContext

SummaryThis chapter describes the acquisition of dispositions for communicative actions as opposed

to physical actions described in Chapter 5. Dispositions are employed in two scenarios, inthe first the robot learns how to interpret indirect speech acts, in the second it learns how toengage another agent to help.

6.1 Indirect Speech Acts

This section presents the approach to dispositions for understanding and reacting to utterancesof other agents. The scenario concerns the interpretation of potential requests to execute anaction. The problem is that people do not always literally mean what they say: they say A, butindirectly really mean B – and B is what you ought to do. Such utterances express indirectspeech acts (ISAs) (Searle, 1969). In order to be a good servant, the robot should not only beable to understand and fulfill direct requests like “Bring me a cup of coffee!”, it should alsoappropriately react to indirect requests like “Can you bring me a coffee?” or “I am thirsty.”.

Indirect speech acts have intended meanings that are different from their literal meanings.Human hearers recognize their real meaning based on the context and often they are not evenaware of the discrepancy between literal and intended meaning. Only in case of misunderstand-ings because of ambiguous situations and conflicting interpretations between communicationpartners do we recognize the gap. What would we think about a service robot that would takeour polite question “Can you bring me a coffee?” literally and replied “yes”, but not showingthe slightest inclination to actually get the coffee? Communication would have failed due to therobot’s ignorance of communicative conventions in human-human interaction. The followingexamples illustrate prototypical request-ISAs that the robot should be able to deal with.

(10) Questionsa. Can you bring me a coffee?b. Could you bring me a cup of tea?

47

6. Dispositions in Communicational Context

c. Will you bring me a coffee?d. Would you bring me a coffee?e. Can you bring me a coffee, please?

(11) Assertionsa. I would like to have a cup of tea.b. I need tea.c. I am thirsty.

All these utterances could be meant as requests to get something to drink. Whether therobot will actually understand them as requests is determined in an interpretation process whichtakes into account the linguistic form and the situation in which the utterance was made. Usingexperience to adapt dispositions becomes relevant in ambiguous cases, that is when the intendedmeaning cannot unequivocally be derived. The robot will then actively ask for the intendedmeaning and adapt its future interpretation accordingly.

6.1.1 Theoretical Background

Traditional logic has attempted to analyze the meaning of an utterance expressed in a naturallanguage solely in terms of its truth values, assuming that sentences “can only be to ‘describe’some state of affairs, or to ‘state some fact’, which it must do either truly or falsely” (Austin,1962). However, for utterances like questions or commands it is not easy to determine their truthvalue. Austin was the first one to consider utterances as actions or as he called it: speech acts. Hedistinguished three kinds of speech acts: (1) the locutionary act, which is the communicative actof saying something, (2) the illocutionary act, which is the act that is performed in saying some-thing, and (3) the perlocutionary act, which is the act that is performed by saying something.The illocutionary act can be considered as the speakers intention and is named by performativeverbs like inform, warn, promise, request, etc. The perlocutionary act is the effect that the speechact has on the context participants’ world, for instance convincing or scaring someone. This istypically beyond control of the speaker.

According to Searle an indirect speech act is an utterance in which one speech act is per-formed indirectly by performing another (Searle, 1969). The default mapping between utter-ances and their illocutionary force – the intended meaning of the speaker – is determined bylinguistic conventions. An import indication is the sentence mood: interrogatives usually expressquestions, declaratives encode assertions and imperatives map to requests. The mapping is alsoencoded in the lexicon, sentences containing a performative verb usually have the illocutionaryforce associated with that verb – “I warn/promise you, that X”. However, in many cases thealignment rules between linguistic form and illocutionary act do not apply. For example, usingthe interrogative mood, we can express a question, a request, an information, or a greeting:

(12) a. question: “Where is the kitchen?”b. request: “Can you pass the salt?”c. inform: “Did you know that the shop closes at 3 o’clock?”d. greet: “How do you do?”.

48


An indirect speech act occurs when the illocutionary act predicted by the established linguistictheory of alignment is distinct from the act actually performed.

6.1.2 Approach

Figure 6.1: Flow diagram for the decision process

The approach to handling request-ISAs starts from a proper analysis of the meaning of anutterance (cf. §4.2). From this meaning the robot can determine whether it makes sense to in-terpret an utterance as a request in the given situation (Levinson, 1983). Should the robot bein doubt, the approach enables it to clarify with the user whether an action was requested, andadapt its interpretation mechanisms to handle similar utterances as requests on future occasions.Otherwise, the robot establishes the actions necessary for fulfilling the request, and carries themout.

The interpretation of an utterance as a potential request is based on its linguistic meaning,its classification as a particular type of question or assertion, and the possibility to address therequest in the current context. Figure 6.1 illustrates the steps in the interpretation process. First,we analyze and classify the utterance for its linguistic properties (A), this allows us to decideif we deal with a potential request (step 1). If not, we treat the utterance as a direct speech act,(either a direct request or a direct act of some other sort) (B); else, we have to check if it isfeasible given the situational context (step 2). If the robot would not be able to act according tothe request, it will apologize and reject (F). Else, we have to decide if the utterance is ambiguous(step 3). That is we can either handle its indirect meaning right away (C), or we have to clarifythe intended meaning before proceeding (D). The decisions in step 2 and 3 are determined bythe linguistic meaning of the utterance (§4.2) and the situational context. The situation basically

49


comprehends the robot’s ability, readiness and willingness to fulfill the request. For instance, ifthe robot had established its readiness to serve the human by uttering “How can I help you?”,a human’s utterances are more likely meant as requests, than in a situation where the robot isobviously busy with the execution of some other actions. If the utterance in the situation requiresclarification, the robot asks the human after the intended meaning. The answer is used to interpretthe utterance (either as a direct (B) or indirect (C) speech act) and to build up experience whichis used later to resolve ambiguous acts without explicitly asking. Thus the robot can adapt itsdispositions for interpreting ambiguous speech acts to the human’s characteristic language use.

Now we look a bit closer at the steps in the decision process. For our decision in step 1we rely on the linguistic analysis and classification as introduced in §4.2. Utterances classifiedas commands are handled as direct speech acts (B), while questions with modality {prediction,volition} or {permission, possibility, ability} and content type transfer.object as well as assertionsof content type desire.object, need.object, or state.physical are passed to the next decision steps.

Decision steps 2 and 3 take the situational context into account. This comprises beliefs,intentions, and abilities of the robot. To fulfill the request “Go to the kitchen!” the robot has toknow the location of the kitchen and how to get there. The situation refers to these requirements,and the current intentions of the robot. The robot can be motivated to serve or it can be occupiedwith other actions preventing it from fulfilling a new request. We capture this aspect of thesituation using a variable that refers to the mode of the robot. We distinguish three modes thatthe robot can be in: The servant mode that arises after the robot explicitly offered its services,the non-servant mode for when the robot is not ready to serve, and the default mode for all othercases, (confer to §4.5, to see how these mode arise from different desires). In step 2 we check ifthe action is feasible given the robot’s skills and knowledge and its readiness to act. Given thegeneral feasibility of the action, we check in step 3 if it is ambiguous depending on the robotbeing in servant or default mode.

If the robot is occupied with other actions that do not allow it to fulfill a request (non-servantmode), or if its knowledge or skills are not sufficient, it will not even try to decide if an utteranceis to be interpreted as an indirect request. How the robot will react depends on the type ofutterance it needs to reply to. For example, to questions as illustrated in (11) it will respond withan apology. Depending on the reason for refusal it will either add an explanation of what it isdoing at the moment or a justification that points to missing knowledge or skills. (“Sorry, I donot know where the kitchen is.”) This is an appropriate reaction to requests as well as literalquestions (Levinson, 1983). (Confer to §4.3 and (Kruijff et al., 2006) for the mechanisms thatenable the action planner to clarify a missing piece of knowledge by using the BDI mediatorand the communication subsystem to issue a clarification question and take record of potentialanswers.) Assertions (11) will be answered with a simple acknowledgment (“I see.”). This wouldnot be an acceptable answer to a request, but as the robot is not ready or able to fulfill a requestanyway, this reaction serves as a good reflection of its state of mind.

The servant mode promotes interpreting utterances as requests: All the utterances in (10)and (11) will be taken as request-ISAs and are handled without the need to clarify.

In default mode, the reactions are less biased to request interpretation. The robot interpretsthe assertions in (11) as ambiguous and clarifies their intended meaning. Questions that contain aplease (10e), are interpreted as requests, because the modifier please inhibits the interpretation as

50


literal questions (Levinson, 1983). For the questions (10a)-(10d) we distinguish the present tenseand past tense forms. This tense information is part of the linguistic meaning of an utterance,which we obtain from linguistic analysis. We do this because past tense forms are less likely tobe used for direct questions than present tense forms (Levinson, 1983). For asking if someone isable or allowed to do X, can would be more appropriate to use than could. The same holds fordirect questions after future plans: for asking if someone is planning to do X, will is better thanwould. So as could and would are associated with a greater degree of tentativeness and politenessthan can and will, we assume them to be more likely used in expressing requests. Thus we willinterpret the questions built by the past tense form (10b) and (10d) as indirect requests, whereaswe regard the meaning of present tense questions (10a) and (10c) as ambiguous and pass themto the clarification process.

Adapting by Clarifying

For clarification the robot produces a question like: “Do you want me to take this as a request?”or “Do you want me to help you?”. Depending on the human’s answer the robot adopts theindirect or direct reading and acts accordingly. §4.3 describes how the BDI mediator involvesthe communication subsystem to raise the question, and analyze and bind potential answers tothe question identifier in order to return the result to the BDI mediator; see also §6.1.3 for anexample run through these processes. The BDI mediator can resolve the ambiguity and the robotuses the result of the clarification to adapt its interpretation strategy for this pair of utterancetype and situation. Each pair of utterance type and situation presents a state s which allows threepossible actions for the robot: It can (1) ask for clarification, (2) adopt the indirect interpretation,or (3) adopt the direct interpretation. The set of possible actions A(s) is thus defined as A ={clar, isa, dsa}. Disposition δ draws on experience and determines which action to take. In thebeginning, when no experience is given (exp0 ) the robot will always ask for clarification. Thusthe disposition function δ : Exp×A 7→ R is defined as

δ(exp0 , a) =

{1.0 : a = clar0.0 : a 6= clar

(6.1)

The human’s answer to the clarification question (as attested by the communication subsys-tem) constitutes the feedback, which is used to build up the experience. For this scenario weadopt a simple update function that only takes into account the last feedback. See Equation 3.1,which is repeated here.

expk = fbk (6.2)

This means that the robot only remembers the last feedback and forgets about all preceding an-swers to its clarification questions in a particular state. The human’s answer to such a clarificationquestion is expected to be either yes or no. These strings are analyzed as having the ontologicaltypes polar.positive and polar.negative respectively, (see §4.2.3). These types are then mappedto a numerical value:

f(u) =

{1 : u = positive−1 : u = negative

(6.3)

51


Disposition δ for non-empty experience is defined in Equation 6.4 and illustrated in Table 6.1,where columns indicate the value of exp and rows indicate the action:

δ = {(1, isa) 7→ 0.9, (1, dsa) 7→ 0.0, (1, clar) 7→ 0.1,

(−1, isa) 7→ 0.0, (−1, dsa) 7→ 0.9, (−1, clar) 7→ 0.1} (6.4)

1 -1 0clar 0.1 0.1 1.0isa 0.9 0.0 0.0dsa 0.0 0.9 0.0

Table 6.1: δ represented as a table

Equation 6.4 and Table 6.1 indicate that with a high probability of 0.9 the robot interpretsan ambiguous utterance according to the response to the last clarification request. To ensureflexibility to possibly changing environments, we let the robot issue a new clarification requestwith a probability of 0.1. If it never would ask again, its interpretation would be fixed and itcould not deal with changes in human’s intended interpretations. Note that in the current scenariothere is no way for the human to correct a misinterpretation. If the dialogue system would beable to process such repairs, they could contribute to the feedback and be incorporated into theexperience.

6.1.3 Examples

The following examples illustrate how the robot interprets utterances as requests based on theirlinguistic characterization and the situational context in which they occurred.1 The informa-tion flows between three conceptual layers of the robot architecture – those for communication,belief-mediation, and action planning. The communication subsystem is responsible for analy-sis and production of natural language. The BDI mediator mediates between the communicationsubsystem and systems for other modalities, (e.g. vision, navigation) of the architecture. It medi-ates based on the beliefs originating in the different modalities. The action planner tries to deriveaction plans based on current goals and beliefs. (See §4.1 for more details about the architecture.)

We start with an example of a direct speech act (a command), to illustrate some basic mech-anisms of the architecture:

(13) “Bring me a tea!”

Figure 6.2 illustrates how the command in (13) is processed. When the communication sub-system receives the speech signal for “Bring me a tea!” (step 1), it parses the string to obtain arepresentation of the linguistic meaning it expresses:

1This section is based on parts of (Wilske and Kruijff, 2006)

52


Figure 6.2: Process flow diagram for “Bring me a tea!”

(14) @b1:action(bring ˆ <Mood>imp ˆ<Actor>(r1:hearer ˆ you) ˆ<Patient>(t1:thing ˆ tea) ˆ<Recipient>(i1:person ˆ I))

(14) shows that the utterance expresses a bring-action, using imp(erative) mood. The hearer, inthis case the robot, is the one to perform the action (ACTOR), getting the tea (PATIENT) for “me”(RECIPIENT). We classify the utterance as a command to transfer an object (transfer.object) onthe basis of the action and the mood. When the BDI mediator receives this information (step 2),we can immediately interpret the utterance as a direct speech act. We now need to check whetherwe can build an action plan. If we can, then we execute the plan, and tell the BDI mediator thatwe were successful. The BDI mediator in turn triggers the communication subsystem to producea positive feedback to indicate that we have understood the command and are carrying it out (step3). If we cannot form a plan in the current situation, then a negative indication is sent back tothe BDI mediator, possibly with a reason why we cannot form a plan. In this case we produce anegative feedback (step 3’).

Figure 6.3 shows how we process the request-ISA in (15):

(15) “Can you bring me a tea?”

53


Figure 6.3: Process flow diagram for “Can you bring me a tea?”

The utterance in (15) also expresses a bring-action, but now in interrogative mood with amodal “can”. Based on the types in Figure 4.5 we classify the utterance (15) as a question withthe underspecified modality {permission,possibility,ability} and the content type transfer.object(step 2). This type qualifies it as a potential request.

We first check whether it is sensible to interpret the question as a possible request, based onthe ability to execute the requested action and the robot’s mode. This illustrates the situation-dependent interpretation of requests. If the robot is able to form an action plan and it is in servantmode the request is deemed unambiguous and the BDI mediator handles it (step 3) by triggeringthe execution of the formed action plan (step 5), and the communication of a positive feedback(step 6). If we cannot form an action plan in the current situation, we proceed as in step 3’ inFigure 6.2. If the robot is in default mode (cf.§6.1.2), the request is ambiguous and needs furtherclarification. (See also the following example.)

Finally, we illustrate how we deal not only with situation-dependent interpretation, but alsohow we adapt interpretation strategies should an ambiguity arise in how we should interpret a(potential) request-ISA. Consider for example (16):

(16) “I need a cup of tea.”

Figure 6.4 gives the process flow for (16). The communication subsystem classifies the utter-ance as an assertion of type need.object, on the basis of its indicative mood and need-predicate.This presents a potential request to the BDI mediator (step 2). Now assume that the robot knowshow to fulfill this need (i.e. it can form an action plan), but that it is in default mode (cf. §6.1.2).

54


Figure 6.4: Process flow diagram for “I need a tea.”

Thus, it could in principle handle the request, but it is not sure whether to do so (as no explicitserving mode has been triggered earlier). The robot considers the request as ambiguous (step3). As it has no previous experience (exp0 ) for this state, it will utter a clarification request.(Cf. Equation 6.1) The BDI mediator sends a request to the communication subsystem to raisethe issue with the human user. The communication subsystem produces a clarification question:“Do you want me to get you a cup of tea?” and stores this question in its model of the dialoguecontext, while returning the identifier of the question to the BDI mediator. When the human an-swers the question (step 5), the rhetorical analysis in the communication subsystem can tie theanswer to the previously posed clarification question.

The communication subsystem informs the BDI mediator of the answer, and the identifier ofthe question it was an answer to (step 6). Through the question-id the BDI mediator can use theanswer to resolve the ambiguity issue of the potential request-ISA: the human’s positive answerconfirms the utterance as a request.2 We use the confirmation for two purposes: To handle therequest, and to update the experience and by virtue of disposition δ the interpretation strategies

2Alternatively, if the human denies the indirect interpretation, the robot interprets the question literally and reactsverbally, e.g. by answering: “I could if you want me to.”

55


for utterances like (16). To handle the request, we execute the formed action plan (step 8) andproduce a positive feedback (step 9).

For adaptation of the interpretation strategy, a positive answer to the clarification requestis translated into fb1 = 1 according to Equation 6.3, and by applying Equation 6.2 we yieldexp1 = 1. The next time that the robot finds itself in the same state, i.e. has to interpret an utter-ance of the same type in the same situation, it considers its experience for that state: Equation 6.4yields a probability of 0.9 for interpreting the utterance as an indirect request and a probabilityof 0.1 to repeat the clarification request. If the human had given a negative response, fb1 = −1,thus exp1 = −1 and Equation 6.4 would yield a probability of 0.9 for interpreting the utteranceas direct speech act (leaving 0.1 to repeat the clarification request). Note that the robot can onlyperceive feedback to the clarification request, it cannot handle any feedback to its interpretationactions. Thus the experience expk cannot change by interpretation actions. This problem shouldbe addressed in future extensions of that work.

6.2 Requesting Help

6.2.1 Introduction

This section describes the employment of dispositions for engaging other agents to help. As therobot only has limited capabilities, it might come in need of a helping agent while acting in theworld. For instance, if it wants to go to another room, a closed door might be inbetween - as itcannot open the door on its own, it would have to ask someone for help. Or if people ask therobot to fetch them drinks like coffee or tea, it can move to the kitchen, but it is not able to usethe coffee maker on its own. In this case, it would need to ask a human to assist with fixing thecoffee. Natural language provides several ways to realize a request for help, ranging from verydirect forms to more indirect formulations. Consider (17):

(17) a. Help!b. Help me!c. Can you help me?d. I cannot open the door.

(17a) is an exclamation reduced to the minimum, (17b) is a direct command, whereas (17c) hasthe surface form of a question, but the intended meaning of a request. (17d) can only be under-stood by reasoning about what the robot wants to achieve and how the human could help in acollaborative setting. For the scope of this thesis, we restrict the robot’s possible communicativeactions to direct commands as in (17b) and questions as in (17c). Its disposition to use them isdetermined by the success they yield.

6.2.2 Approach

In the following sections we will describe how the robot acquires dispositions for formulatinghelp requests based on the feedback that it gets. We use a graph representation to capture feed-back, experience and the entailed dispositions. When the robots needs help, it tries one of the

56

6.2 Requesting Help

Figure 6.5: Graph indicating possible courses of dialogue for requesting help. At each circle therobot can either utter a question (?) or a command (!). The human will either accept or refuse therequest, which leads to success (+) or failure of (-) the request. Acceptance ends the dialogue,refusal leads to another trial. If the human did not accept after the second trial, the robot givesup.

possible help requests. If the request is accepted and fulfilled, the robot expresses its gratitude.If the request is refused or ignored, the robot retries. If it still does not succeed, it gives up.Acceptance and refusal will be stored and used to determine the engagement strategy in futureoccasions.

Figure 6.5 depicts possible courses of a request dialogue. Circles indicate states in whichthe robot has to decide which action to take. Solid lines indicate the execution of an action,either uttering a question (?) or a command (!). Dashed lines indicate possible answers of thehuman, leading either to success (+) of the request or to failure (-). Success terminates the requestdialogue, failure triggers a second trial. If the robot did not succeed after the second trial, it givesup and utters “I give up.”.

For this scenario, the relevant feedback of the environment is the answer of the human.The set of possible answers consists of ok, yes and no, ok and yes indicating acceptance andthus success of the action, whereas no means refusal of the request and thus failure of the ac-tion.3 The linguistic analysis and classification of the utterance no yields the ontological typepolar.negative, utterances ok and yes are analyzed as polar.positive, cf. §4.2.

The feedback builds up the experience and is stored in the graph structure. For each experi-enced course of dialogue the counter for the corresponding edges is incremented. These countsare used to calculate probabilities for expected answers of the human. In order to gain solid ex-

3The robot should not only consider the verbal answer to help request, but it should also be able to recognizethe actual actions of the human. If a human responds to a help request by executing an appropriate action but doesnot give an explicit answer, the robot could evaluate this as a success of its request. If a human verbally accepts ahelp request but does not actually help, the robot could recognize and account for this inconsistency. However, thisfunctionality is not within the scope of this thesis, it might be addressed in future work.

57


Figure 6.6: Dialogue graph with numbered nodes, indicating cost of nodes at the right.

perience the robot will go through a short exploration phase in the beginning, in which it trieseach action for 3 times regardless of its expected utility. This is the process of acquiring a dis-position – after this learning phase, the robot estimates the expected utility of its actions andchooses the best one. The disposition is expressed as choosing the best action according to thegoal of convincing a human to help. The next section introduces more formal definitions for thisintuitive characterization.

6.2.3 Formal Representation

This section gives a formal characterization of the acquisition of dispositions based on feedbackand experience in the context of requesting help. We will start with the description of the graph,then show how we use it to store experience, and finally define dispositions in terms of thisexperience.

The Graph

The following descriptions are illustrated partly in Figure 6.6 which depicts the dialogue graphfrom Figure 6.5 in more detail. Given a graph G = 〈V,E〉, with node set V and edges E

V = {s0 , s1 , s2 , s3 , s4 , s5 , s6 , s6 , s8 , s9 , s10 , s11 , s12}

E = {〈s0 , s1 〉, 〈s0 , s2 〉, 〈s0 , s3 〉, 〈s0 , s4 〉, 〈s2 , s5 〉, 〈s2 , s6 〉,〈s2 , s7 〉, 〈s2 , s8 〉, 〈s4 , s9 〉, 〈s4 , s10 〉, 〈s4 , s11 〉, 〈s4 , s12 〉}

Nodes in that graph refer to dialogue states, edges indicate the robot’s speech acts, they areexecuted in one state and lead to a next state. The states are characterized by the reaction of theother dialogue participant, which can be either acceptance or refusal. Thus we distinguish three

58

6.2 Requesting Help

different types of nodes in the graph: Those referring to acceptance, those referring to refusal andthe root node which contains no previous reaction. To capture the different values of acceptanceand refusal, we define a function costv that maps from nodes to integer values: costv : V 7→ N :

costv (si) =

{0 : imod 2 = 01 : imod 2 = 1

Confer to Figure 6.6 where the costs of nodes are indicated at the right side of the node. Refusalnodes have cost 1, acceptance nodes and the root node have a cost of 0. The cost of a path (=sequence of nodes connected by edges) is defined by the function costp that maps from pathpath ∈ P in the tree to integer values: costp : P 7→ N :

costp(path) ={

0 : length(path) = 0costv (first(path)) + costp(rest(path)) + 1 : else

The mapping between edges and the action they correspond to is expressed by the functionact : E 7→ A with A = {question, command}

act = {e ∈ {〈s0 , s1 〉, 〈s0 , s2 〉, 〈s2 , s5 〉, 〈s2 , s6 〉, 〈s4 , s9 〉, 〈s4 , s10 〉} 7→ question,

e ∈ {〈s0 , s3 〉, 〈s0 , s4 〉, 〈s2 , s7 〉, 〈s2 , s8 〉, 〈s4 , s11 〉, 〈s4 , s12 〉} 7→ command}

Questions are indicated as question marks, commands as exclamation marks in Figure 6.6.

Storing experience

The foregoing definitions constitute the fixed parts of the dialogue tree, which provide the struc-ture for storing the feedback and the built up experience about past dialogue courses. Feedbackis the reaction of the other agent to the request, i.e. a transition from one state sx to another statesy . The robot gains experience by collecting counts for each edge 〈sx , sy〉, that indicate howoften the action a = act(〈sx , sy〉) was taken in state sx and led to state sy . These counts can beexpressed by a function count : E 7→ N . Given these counts, we can define a function prob thatassigns an estimated probability to each edge: prob : E 7→ R.

prob(〈sx , sy〉) =count(〈sx , sy〉)∑

si,a∈{s|〈sx ,s〉∈E∧act(〈sx ,s〉)=a}count(〈sx , si ,a〉)(6.5)

Note that the probabilities of edges with same start node sx and same action a sum up to 1:∑si,a∈{s|〈sx ,s〉∈E∧act(〈sx ,s〉)=a}prob(〈sx , si ,a〉) = 1 (6.6)

The definition prob using count values from all of the collected feedback so far does notapply any weighting depending on the recency of the feedback, but takes the average as describedby Equation 3.2. A simple way to put higher emphasis on recent feedback than on past feedbackis to limit the counts to only the last m feedbacks. No feedback from more than m previoussteps is taken into consideration.4 The smaller we choose m, the more forgetful the agent but thefaster it adapts to possible changes in the environment.

The function prob contains the collected experience of the robot and thereby addresses thelearning problem.

4More sophisticated recency weightings might be possible.

59


Dispositions

For handling the control problem, we have to define the disposition function δ.For the definition of dispositions δ we distinguish two cases: In the beginning the robot is

learning and trying to gain experience for each of the possible actions. After the learning phase,it will choose the action which is optimal according to its experience.

Phase 1: Initial learning In the first case, the disposition δ is defined with reference to afunction lta that chooses one of the least tried actions. Given a length of the learning phase n, δis defined for expk , k ≤ n:

δs(expk , a) = lta(s) (6.7)

The function lta : V 7→ A, that returns the least tried action originating in state s, i.e. the actionthat were executed the least, is defined as follows:

lta(s) = act(argmin〈s,s′〉count(〈s, s′〉)) (6.8)

Note that we diverge from the traditional definition of argmin for cases in which there is nounique minimum; argmin will return one of the minima randomly if there is more than onecheapest path. This means that the robot chooses one of the least tried actions in the initiallearning phase. Note that the unconventional definition of argmin causes no harm as long thelength of the learning phase n is a multiple of the cardinality of the set of possible actions ins: n mod |A(s)| = 0. This ensures that all actions were tried the same number of times. Weset n = 6, which seems an appropriate mean to gain a solid experience, but make use of theexperience relatively soon.

Phase 2: Acting according to experience After the learning phase, the robot will choose theaction that promises the best results according to its experience. With reference to the graph, themost promising action for a state s is the first action on the cheapest path of the most probablepaths (mpp) starting in s. This is expressed by Equation 6.9

δs(exp, a) =

{0.8 : a = firstAction(cheapest(mpp(s))0.2 : else

(6.9)

(We will define firstAction, cheapest, and mpp shortly in Equation 6.10, 6.11, and 6.12).Note that suboptimal actions will be chosen with a small probability of 0.2 in order to maintainflexibility to changing environments. By choosing a suboptimal action, the robot could succeedanyway if the environment changes such that those actions yield a better result. If it would nothave tried, it would not have gained experience for that action. Note also that the robot still gainsexperience after the initial learning phase by keeping track of the feedback, thus the count andprob function are subject to constant change.

We will now define the three functions firstAction, cheapest, and mpp that we used forthe definition of δ. The function firstAction : P 7→ A selects the first action in a given path :

firstAction(path) = act(first(path), first(rest(past))) (6.10)

60

6.2 Requesting Help

Note that firstAction is only defined for paths that have a length of at least 1: length(path) ≥1, (given that the length of a path refers to the number of edges that it uses).

Second we define a function cheapest : P(P ) 7→ P that chooses the cheapest path from aset of paths.

cheapest(P ) = argminpi∈Pcostp(pi) (6.11)

Again, as in lta, we use a randomized version of argmin that will return one of the minimarandomly if there is more than one edge with minimal count.

At last we define a function that maps from a node to the set of most probable paths that startin that node – mpp : V 7→ P(P )

mpp(sx ) =⋃

a∈A[argmaxsy∈{s|〈sx ,s〉∈E∧act(〈sx ,s〉)=a}prob(sx , sy)] (6.12)

Note that in this case, argmax indicates a set of solutions in case there is more than one maxi-mum.

Dispositions for terminal states At last, we define the actions for the terminal states in thegraph. Those are independent from experience. For dialogue states that refer to a succeededrequest (s1 , s3 , s5 , s7 , s9 , s11 ), the disposition is fixed to expressing thank. For dialogue statesthat indicate a failed request after two trials (s6 , s8 , s10 , s12 ), the disposition is fixed to a giveup action, i.e. uttering “I give up.”

6.2.4 The System

This section describes how the mechanism of requesting help is embedded in the overall robotarchitecture. Figure 6.7 illustrates the parts of the robot architecture (presented more detailedin §4.1) that are relevant for the issue of help requests. As described more explicitly in §4.3,requesting help involves the subsystems for BDI and for communication. Apart from the BDImediator, the BDI-subsystem contains the action planner and the disposition-management thatis used to adapt dialogue strategies.

Given a goal, the action planner generates a sequence of actions and triggers the responsiblemodules for those actions. The need of help arises in two cases – either the action plannerdetermines a human agent as the agent for the next action (instead of the robot itself), or anaction was attempted by the robot but failed. In those cases the module for dialogue strategydecides for a form of a help request (command or question) and with help of the BDI mediatortriggers the communication subsystem to generate and realize a communicative goal to producethis request. The communication subsystem checks the following perceived utterances if theyare valid answers to the request, using its referential and rhetorical resolution mechanisms. If anutterance presents a valid answer, the BDI mediator passes it from the communication subsystemto the dialogue strategy module. In case of a positive response, the action planner can continueto follow its plan. If the answer is negative, the strategy module decides whether to retry or togive up. In the first case, it will invoke the communication subsystem again; in the second caseit will notify the action planner of the failure.

61


Figure 6.7: Parts of the robot architecture with focus on requesting help

62

6.3 Conclusion

6.3 Conclusion

This chapter presented the employment of dispositions for communicational actions regardingproduction as well as interpretation. In the first part we presented an approach to deal withindirect speech acts – the robot learns how to interpret utterances that may be meant as indirectrequests. It takes into account the linguistic meaning of those utterance, their classification as acertain type of assertion or question and the possibility and willingness to address the request inthe current context. By clarifying ambiguous utterances, the robot can adapt its interpretation tothe specific language use of an individual agent which contributes to its social skills.

The second part of the chapter presented the acquisition of dispositions for producing helprequests – the robot learns how to engage a human agent to help it in pursuing its goals. For thispurpose it tries different forms of help requests and considers the success they yield for futurerequests.

63

Chapter 7

Conclusion

SummaryThis chapter summarizes the thesis, presents advantages and shortcomings of the presented

approach, and suggests possible extensions.

7.1 Recapitulation

The goal of this thesis was to present an adaptational mechanism to support social behavior of arobot in interaction with humans. This mechanism – the acquisition of dispositions – is appliedand exemplified in two different domains: Actions in the physical environment and commu-nicative actions in the social environment. In the physical environment, the robot’s actions areinfluenced by their previous success or failure and by verbal evaluations of a human. Commu-nicative actions regard the production of help requests and the understanding of possibly indirectformulated requests.

The thesis started with a presentation of relevant issues for designing sociable robots withthe perspective on how the key mechanism of this work – dispositions – could address thoseissues (Chapter 2). One prerequisite of intelligent behavior is the ability of an agent to changeitself as a result of its experience, it allows an agent to adapt to its environment or acquire somekind of knowledge. Dispositions provide an adaptational mechanism by relating experience tobehavioral tendencies.

Chapter 3 continued with the characterization of dispositions as an adaptational process bywhich the robot gains experience based on its actions and perceptions of the environment’sfeedback to those actions. This experience is used to select future actions; dispositions – defineddepending on experience – express the probability to choose one out of a set of possible actions.In order to direct the acquisition of dispositions, we evaluate the feedback and derive dispositionsthat maximize the expected value of future feedback.

In Chapter 4 we introduced the embedding robot architecture and described theoretical andimplementational foundations for communicating in the social environment and acting in thephysical environment, e.g. linguistic analysis and raising of clarifying subdialogues. The chapterended with the presentation of a BDI-model that integrates the effect of dispositions for thedifferent scenarios.

64

7.2 Discussion and Extensions

Chapter 5 presented the application of dispositions for acting in the physical environment –the robot acquires the tendency to play with or avoid certain kinds of objects, based on success ofthose actions and verbal evaluations of other agents. In the end, Chapter 6 described the employ-ment of dispositions for communicational actions regarding production as well as interpretation.The robot deals with indirect speech acts by learning how to interpret utterances that may bemeant as indirect requests, and it learns how to efficiently produce requests to obtain help fromanother agent.


The central aspect of the approach is adaptivity – the behavior of the robot is not predefinedand fixed, but flexible and influenced by experience. An important feature of the approach is itsonline nature – the interaction between agent and environment is continual. The robot does notgo through a long – possibly simulated – learning phase in preparation for it to later act in thereal world, but it has to act in the real world from the very beginning and is constantly learningwhile acting.

Using preferences or dispositions for deciding what to do is not a common approach. Al-though there are adaptive interactive systems that decide what to do based on the observed orstated preferences of a human user (Jameson, 2003; Mitsunaga et al., 2005), those systems relatetheir behavior solely to user’s preferences and do not hold preferences of their own.1 The char-acteristic feature of our approach is that the robot itself maintains preferences. Those internalpreferences might partly arise from externally observed preferences of a user, but those externalpreferences are interpreted and valued according to internal goals and motivations. Thus, theyonly serve as one possible input for determining the disposition of the robot.

The following three sections characterize the contributions, shortcomings and possible ex-tensions for each of the three scenarios that build on dispositions.

7.2.1 Interpretation of Indirect Speech Acts

Online adaptivity is a distinguishing feature for the handling of indirect speech acts. Other com-putational approaches to derive the meaning of indirect speech acts encode the required knowl-edge statically and cannot deal with ambiguous cases (Gerlach and Sprenger, 1988; Hinkelmanand Allen, 1989; Allen et al., 2001). The second key feature of the ISA approach is the situation-dependent flexibility. Except for the practical interactive planning system TRIPS described in(Allen et al., 2001), which interprets speech acts with regard to the current problem solving con-text, no other approach accounts for the situational context. However, the notion of situatednessin (Allen et al., 2001) is different from the situatedness relevant in an autonomous robot scenario.We have to deal with situations that comprise the current beliefs and intentions and abilities ofthe robot in its current physical and social environment.

1(Shapiro and Langley, 2002) present an approach to acquiring preferences by providing agents with differentreward functions, but in this setting the differences between preferences of agents are more determined by theirdifferent reward functions than by their differing experience.

65

7. Conclusion

Our approach to interpret utterances concentrates on the linguistic analysis and the situa-tional context. We do not yet have an explicit model for intentions and collaborative aspects(Sidner et al., 2004), which we need in order to integrate the problem solving context (Allenet al., 2001) into our notion of situation. Furthermore, the way we handle ISAs can be improvedin other ways. So far we only deal with requests-ISAs, and could thus extend to handle othertypes of ISAs that are relevant in service robotic scenarios. Furthermore, although the currentsystem can adapt its interpretation of certain types of utterances in certain situations, it still lacksa general adaptability regarding all possible interpretations. Utterances that are not classified asambiguous will always be interpreted according to the supposed model. If this interpretationclashes with the intention of the user, it is impossible to resolve this misunderstanding and toteach the system the more appropriate interpretation. To extend the adaptational mechanism tothose cases, the dialogue management needs to be refined such that it enables the resolution ofthose misunderstandings. Finally, if we could distinguish between humans (e.g. through facerecognition), we could make the adaptation user-specific thus learning user-specific preferences.

7.2.2 Requesting Help

The distinction between human individuals is also important in order to fully exploit the abilityfor acquiring a dialogue strategy to engage other agents to help. It would allow to maintain amodel for each individual.

Another limitation regards the number of alternatives to realize a request. At the momentthe set of possible actions to request help is rather small – the robot can only choose betweenformulating a simple command or question. One could think of more alternatives and possiblyemploy a modifier for politeness as “please”. Another possible extension is to make the helprequests more explicit, such that they contain a hint on how to help: “Can you help me andopen the door?” The problem with a greater range of possible actions is that it complicates andlengthens the learning process. But the advantage of a wider range of utterances is that it candiversify the communicative behavior of the robot and make it more agreeable or even appearmore intelligent.

An important practical issue is the recognition of helping actions and their integration intothe dialogue strategy. The robot should recognize that a human was helping, even if the humandid not explicitly accept the help request. And the robot should recognize when the humanaccepted verbally but did not actually help.

7.2.3 Actions in the Physical Environment

A shortcoming of the approach on acquiring dispositions in the physical environment is that therobot can only recognize feedback for actions referring to objects. It ignores any feedback forthe other actions. One could extend the setting such that each of the possible actions in a statecan receive feedback. This would entail a more complex calculation of the disposition functionδ.

Another issue is the dependency between actions. At the moment the dispositions for ap-proaching and pushing an object are derived independently and do not interact. However, thesetwo actions are not independent because the robot always has to approach an object before being

66


able to push it, given that the object does not move or is moved to a position directly in frontof the robot. One could derive a more complex disposition function δ that accounts for suchinterdependencies between actions.

7.2.4 Experimental Evaluation with Naive Users

In order to evaluate and judge the usefulness of the implemented approaches we would needto conduct a series of experiments with naive users with no prior knowledge. This would allowus to see how the implementation of dispositions contributes to the perception of sociabilityand acceptance of the robot. This could also involve a comparison between different degrees ofsocial competences that arise from varying parameters in the adaptational processes.

67

Bibliography

Allen, J., Byron, D., Dzikovska, M., Ferguson, G., Galescu, L., Stent, A., 2001. Towards con-versational human-computer interaction. AI Magazine 22 (4), 27–37.

Areces, C., 2000. Logic engineering. The case of description and hybrid logics. Phd thesis,University of Amsterdam, Amsterdam, the Netherlands.

Arkin, R. C., 1998. Behavior-Based Robotics. Intelligent Robots and Autonomous Agents. MITPress, Cambridge MA.

Arkin, R. C., Fujita, M., Takagi, T., Hasegawa, R., 2003. An ethological and emotional basis forhuman-robot interaction. Robotics and Autonomous Systems 42 (3-4), 191–201.

Asher, N., Lascarides, A., 2003. Logics Of Conversation. Studies in Natural Language Process-ing. Cambridge University Press, Cambridge, United Kingdom.

Austin, J. L., 1962. How to do things with words. Harvard University Press, Cambridge, MA.

Baldridge, J., 2002. Lexically specified derivational control in Combinatory Categorial Gram-mar. Ph.D. thesis, University of Edinburgh.

Baldridge, J., Kruijff, G.-J. M., 2002. Coupling CCG and hybrid logic dependency semantics.In: Proceedings of ACL 2002. Philadelphia, Pennsylvania.

Baldridge, J., Kruijff, G.-J. M., 2003. Multi-modal combinatory categorial grammar. In: Pro-ceedings of EACL 2003. Budapest, Hungary.

Bar-Hillel, Y., 1953. A quasi-arithmetical notation for syntactic description. Language 29, 47–58, as cited by Steedman and Baldridge (2003).

Baron-Cohen, S., 1995. MindBlindness. MIT Press/AAAI Press, Cambridge, MA, USA.

Biber, D., Johansson, S., Conrad, S., Finnegan, E., 1999. Longman Grammar of Spoken andWritten English. Pearson Education, Ltd., Harlow, England.

Blackburn, P., 1990. Nominal tense logic and other sorted intensional frameworks. Ph.d. thesis,University of Edinburgh, Edinburgh, Scotland.

Blackburn, P., 1994. Tense, temporal reference and tense logic. Journal of Semantics 11, 83–101.

68

BIBLIOGRAPHY

Blackburn, P., 2000. Representation, reasoning, and relational structures: a hybrid logic mani-festo. Journal of the Interest Group in Pure Logic 8 (3), 339–365.

Bos, J., Klein, E., Oka, T., 2003. Meaningful conversation with a mobile robot. In: Proceed-ings of the Research Note Sessions of the 10th Conference of the European Chapter of theAssociation for Computational Linguistics (EACL’03). Budapest, Hungary.

Bratman, M. E., 1987. Intentions, Plans, and Practical Reason. Harvard University Press, Cam-bridge MA.

Breazeal, C., 2004. Function meets style: Insights from emotion theory applied to hri. IEEETransactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews.

Breazeal, C., Brooks, A., Gray, J., Hoffman, G., Kidd, C., Lee, H., Lieberman, J., Lockerd,A., Mulanda, D., 2004. Humanoid robots as cooperative partners for people. Submitted toInternational Journal of Humanoid Robots, In Review.

Canamero, L., Fredslund, J., 2000. How does it feel? emotional interaction with a humanoid legorobot. In: Dautenhahn, K. (Ed.), Socially Intelligent Agents: The Human in the Loop. Papersfrom the AAAI 2000 Fall Symposium in Cape Cod, Massachusetts, USA. AAAI Press, MenloPark, California, USA, pp. 23–28.

Cheyer, A., Martin, D., March 2001. The open agent architecture. Journal of AutonomousAgents and Multi-Agent Systems 4 (1), 143–148.

Curry, H. B., Feys, R., 1958. Combinatory Logic. Vol. 1. North Holland, as cited by Steedmanand Baldridge (2003).

Damasio, A. R., 1994. Descartes’ Error: Emotion, Reason and the Human Brain. Gros-set/Putnam, New York.

Dautenhahn, K., 1995. Getting to know each other—artificial social intelligence for autonomousrobots. Robotics and Autonomous Systems 16 (2-4), 333–356.

Dautenhahn, K., 1997. I could be you - the phenomenological dimension of social understand-ing. Cybernetics and Systems Journal 28 (5), 417–453.

Dautenhahn, K., 1998. Grounding agent sociality: The social world is its own best model.In: Proceedings of the 14th European Meeting on Cybernetics and Systems Research, EM-CSR’98. pp. 779–784.

Davis, A. R., 1996. Linking and the hierarchical lexicon. Ph.D. thesis, Department of Linguis-tics, Stanford University, Stanford CA.

Dennett, D., 1987. The intentional stance. Bradford Books/MIT Press, Cambridge (MA).

DiSalvo, C., Forlizzi, J., 2006. Service robots in the domestic environment: A study of theroomba vacuum in the home. In: Human-Robot Interaction 2006. Salt Lake City, Utah, USA.

69

BIBLIOGRAPHY

Dowty, D. R., 1989. On the semantic content of the notion “thematic role”. In: Chierchia, G.,Partee, B. H., Turner, R. (Eds.), Properties, Types, and Meaning: Volume II, Semantic Issues.Vol. 39 of Studies in Linguistics and Philosophy. Kluwer Academic Publishers, Dordrecht,The Netherlands, pp. 69–129.

Duffy, B. R., 2003. Anthropomorphism and the social robot. Special Issue on Socially InteractiveRobots, Robotics and Autonomous Systems 42, 170–190.

Duffy, B. R., Joue, G., September 2005. Why humanoids. In: 4th Chapter Conference on AppliedCybernetics 2005. City University, London, United Kingdom.

Fillmore, C. J., 1968. The case for case. In: Bach, E., Harms, R. T. (Eds.), Universals in Linguis-tic Theory. Holt, Rinehart and Winston, New York, pp. 1–90.

Fong, T., Nourbakhsh, I., Dautenhahn, K., 2003. A survey of socially interactive robots. Roboticsand Autonomous Systems 42 ((3-4)), 143–166.

Gerlach, M., Sprenger, M., 1988. Semantic interpretation of pragmatic clues: connectives, modalverbs, and indirect speech acts. In: Proceedings of the 12th conference on Computationallinguistics. Association for Computational Linguistics, Morristown, NJ, USA, pp. 191–195.

Halliday, M. A. K., Christian, M., 1994. An introduction to functional grammar, 3rd Edition.Hodder & Stoughton, London.

Heylen, D., Nijholt, A., op den Akker, R., 2005. Affect in tutoring dialogues. Journal of AppliedArtificial Intelligence (special issue on Educational Agents - Beyond Virtual Tutors) 19 (3-4),287–310.

Hinkelman, E., Allen, J. F., 1989. Two constraints on speech act ambiguity. In: Proceedings ACL1989.

Jameson, A., 2003. Adaptive interfaces and agents. Lawrence Erlbaum Associates, Inc., Mah-wah, NJ, USA, pp. 305–330.

Joshi, A. K., Vijay-Shanker, K., Weir, D., 1991. The convergence of mildly context-sensitivegrammar formalisms. In: Sells, P., M., S. S., Warsow, T. (Eds.), Foundational Issues in NaturalLanguage Processing. MIT Press, Cambridge, MA, USA, pp. 31–81.

Kozima, H., Nakagawa, C., Yano, H., 2003. Can a robot empathize with people? In: InternationalSymposium Artificial Life and Robotics, AROB-2003. Beppu, Japan, pp. 518–519.

Kruijff, G.-J., 2005. Context-sensitive utterance planning for ccg. In: Proceedings of the Euro-pean Workshop on Natural Language Generation. Aberdeen, Scotland.

Kruijff, G.-J., Zender, H., Jensfelt, P., Christensen, H. I., Mar 2006. Clarification dialogues inhuman-augmented mapping. In: Human-Robot Interaction. IEEE/ACM, Salt lake City, UT.

70

BIBLIOGRAPHY

Kruijff, G.-J. M., 2001. A categorial-modal logical architecture of informativity: Dependencygrammar logic & information structure. Ph.D. thesis, Charles University, Prague, Czech Re-public.

Larsson, S., 2002. Issue-based dialogue management. Ph.D. thesis, Department of Linguistics,Goteborg University, Goteborg, Sweden.

Levinson, S. C., 1983. Pragmatics. Cambridge University Press.

Mitsunaga, N., Smith, C., Kanda, T., Ishiguro, H., Hagita, N., aug 2005. Robot behavior adapta-tion for human-robot interaction based on policy gradient reinforcement learning. In: IROS-05. IEEE/RSJ, Alberta, Ca, pp. 1594–1601.

Neisser, U., 1976. Cognition and Reality: Principles and implications of cognitive psychology.Freeman, San Francisco, CA.

Niles, I., Pease, A., 2001. Towards a standard upper ontology. In: Proceedings of the 2nd Inter-national Conference on Formal Ontology in Information Systems (FOIS-2001).

Nilsson, N. J., 1965. Learning machines: Foundations of trainable pattern classifying systems.McGraw-Hill, New York, as cited by Arkin (1998).

Peirce, C. S., 1992. Reasoning and the Logic of Things: The Cambridge Conference Lectures of1898. Harvard University Press, Cambridge MA.

Rao, A. S., Georgeff, M. P., 1991. Modeling rational agents within a BDI-architecture. In: Allen,J., Fikes, R., Sandewall, E. (Eds.), Proceedings of the 2nd International Conference on Prin-ciples of Knowledge Representation and Reasoning (KR’91). Morgan Kaufmann, San MateoCA, pp. 473–484.

Reeves, B., Nass, C., 1996. The Media Equation: How People Treat Computers, Television, andNew Media Like Real People and Places. Cambridge University Press, Cambridge.

Scassellati, B., 2002. Theory of mind for a humanoid robot. Autonomous Robots 12 (1), 13–24.

Schank, R. C., Winter 1987. What is ai, anyway? AI Magazine 8 (4), 59–65, as cited by Arkin(1998).

Searle, J. R., 1969. Speech Acts: An Essay in the Philosophy of Language. Cambridge UniversityPress, Cambridge, United Kingdom.

Severinson-Eklundh, K., Green, A., Huttenrauch, H., 2003. Social and collaborative aspects ofinteraction with a service robot. Robotics and Autonomous Systems 42 (3-4), 223–234.

Sgall, P., Hajicova, E., Panevova, J., 1986. The Meaning of the Sentence in Its Semantic andPragmatic Aspects. D. Reidel, Dordrecht, the Netherlands.

Shapiro, D. G., Langley, P., 2002. Separating skills from preference: Using learning to programby reward. In: ICML ’02: Proceedings of the Nineteenth International Conference on MachineLearning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 570–577.

71

BIBLIOGRAPHY

Sidner, C. L., Dzikovska, M., December 2005. A first experiment in engagement for human-robot interaction in hosting activities. Advances in Natural Multimodal Dialogue Systems.

Sidner, C. L., Lee, C., Kidd, C. D., Lesh, N., November 2004. Explorations in engagement forhumans and robots. In: IEEE RAS/RSJ International Conference on Humanoid Robots.

Siegwart, R., Nourbakhsh, I. R., 2004. Introduction to Autonomous Mobile Robots. IntelligentRobotics and Autonomous Agents. MIT Press.

Steedman, M., 2000. The Syntactic Process. The MIT Press, Cambridge MA.

Steedman, M., Baldridge, J., 2003. Combinatory categorial grammar, tutorial paper, availablefrom http://www.iccs.informatics.ed.ac.uk/∼jmb/ccg.pdf.

Sutton, R. S., Barto, A. G., 1998. Reinforcement Learning: An Introduction. The MIT Press,Cambridge, MA.

Theobalt, C., Bos, J., Chapman, T., Espinosa-Romero, A., Fraser, M., Hayes, G., Klein, E.,Oka, T., Reeve, R., 2002. Talking to godot: Dialogue with a mobile robot. In: Proceedings ofIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2002). IEEE,Lausanne, Switzerland, pp. 1338–1343.

Velasquez, J. D., 1998. When robots weep: emotional memories and decision-making. In: AAAI’98/IAAI ’98: Proceedings of the fifteenth national/tenth conference on Artificial intelli-gence/Innovative applications of artificial intelligence. American Association for ArtificialIntelligence, Menlo Park, CA, USA, pp. 70–75.

Velasquez, J. D., Maes, P., 1997. Cathexis: a computational model of emotions. In: AGENTS’97: Proceedings of the first international conference on Autonomous agents. ACM Press,New York, NY, USA, pp. 518–519.

Ventura, R., 2000. Emotion-based agents. Master’s thesis, Instituto Superior Tecnico, TechnicalUniversity of Lisbon, Lisbon, Portugal.

Webster (Ed.), 1984. Webster’s Ninth New Collegiate Dictionary. Merriam-Webster, Springfield,MA, as cited by Arkin (1998).

Wilske, S., Kruijff, G.-J., 2006. Service robots dealing with indirect speech acts. In: Submittedto IROS-2006. Beijing, China, in Review.

Ziemke, T., 1996. Towards autonomous robot control via self-adapting recurrent networks. In:ICANN. pp. 611–616.

72

Appendix A

Source Code Documentation

This appendix contains the documentation for the Java implementation of the presented ap-proaches. The presented classes partly rely on the Open Agent Architecture (OAA)1 (Cheyer andMartin, 2001) and the Player/Stage2 robot control interface. Some of the described classes actas an agent in OAA, they register their services (called solvables) with a facilitator and providethem for other agents requesting them. In turn they can request other agents’ services as well.Figure A.1 shows all the relevant agents and indicates the solvables they are offering and who isrequesting them.

A.1 Acting in the Physical Environment

This section describes the important classes for the implementation of dispositions in the physi-cal environment as described in Chapter 5.

A.1.1 Skillsde.dfki.cosy.robot.skills.lib

The skills described below, extend the abstract class SkillThread, which implements theinterface Skill. They implement basic actions of the robot.

DriveTo driving to a given coordinate, which can be absolute or relative and given in polaror cartesian coordinates. Execution is aborted if an obstacle is perceived or front bumpersare triggered, (referred to as go in Chapter 5).

TurnSkill turning for a given angle; in case there is an obstacle right in front of the robot,it moves a little bit backward in order to avoid any collision when turning. Execution isaborted if any bumper is triggered.

GoBack going back without turning for a certain distance; execution is interrupted if rearbumpers are triggered.

1http://www.ai.sri.com/oaa/2http://playerstage.sourceforge.net/

73

A. Source Code Documentation

Figure A.1: This figure indicates the relevant agents and communication between them via theirsolvables.

PushObject pushing an object immediately in front by driving against it, optional parametersare the distance to push, the force (i.e. the velocity) to apply, and the type of object.Execution is interrupted if bumpers are triggered, or if the object leaves a given angle infront of the robot.

ApproachObjectCentered approaching an object up to a given distance, such that it isright in front of the robot. Optional parameters are the target distance between object androbot, the allowed angle of the nearest point of the object, and the type of object.

A.1.2 Recognitionde.dfki.cosy.robot.recognition

Most of these skills above rely on the laser sensor. This section introduces the classes that inter-pret the readings from the SICK laser range finder installed on the robot.

RecObject is a representation of an object recognizable by the robot’s sensors. (Currentlyonly laser range finder). Important fields: centerPoint and nearestPoint, twoshort[] indicating the polar coordinates of the points; size indicates the distancebetween leftmost and rightmost point of the object in mm; shape indicates the shape ofthe object (straight line, semi circle, undefined).

EnvObject an interface for the recognition of objects in the environment (legs, buckets, etc)based on laser readings. Important accessor methods are getObjects,

74


getNearestObject, and getAllAndNearest, which return a representation of therecognized object(s) that match(es) the constraints on the environmental object descrip-tion.

DefaultObj implements EnvObject, an environmental object with no constraints regard-ing its size or shape, serves as a wild card.

Bucket implements EnvObject, representing trash bins, i.e. semi circles with a diameterbetween 150 and 310 mm.

Cylinder implements EnvObject, respresenting cylinder shaped object i.e. semi circleswith a diameter between 80 and 120 mm, (referred to as playmobile in Chapter 5).

OfflineEnvObjectRecognizer implements a recognizer for objects in the environmentbased on laser readings. The laser readings will be interpreted and with support of aShapeRecognizer different objects can be recognized. They can be accessed by sev-eral methods. As opposed to the EnvObjectRecognizer theOfflineEnvObjectRecognizer has no direct access to a laser device but insteadwill be given a laser reading. This laser reading (of type int[]) can be given at the verybeginning as an argument of the constructor, or it can be set after that, and it can be givenas a parameter for several querying methods.

A.1.3 Conditionsde.dfki.cosy.robot.condition

Conditions summarize certain properties of the perceived environment and serve as an abstrac-tion over sensor readings required by the above-mentioned skills.

SimpleCondition An interface for a condition. A condition works on any data availablefrom the robot, e.g. sensor readings (laser, sonar, etc.) or actuators (motor). It has nodirect access to the devices but only to the data. An implementing class will interpretthe data and return a boolean or an array of booleans, if the condition(s) are matched(fulfill,fulfillMany) and possibly any additional parameters that might be useful.

RecObjDistanceAreaCond implements the condition of having recognized a certain ob-ject (a) within a certain distance, and (b) within a certain angle. The necessary data is givenas a HashMap either in initializing the class or by the method feed. The HashMap needs4 entries: stored under the key object (String) is the kind of the object which is coded asan actual instance of EnvObject under the key laserdata the laser reading is provided(ideally this should be an average over at least 3 readings) under the key distance anint[] of length 2 which gives the interval in which the distance of the object should layunder the key area an int[] of length 2 which gives the smallest and greatest angle inhalf degrees. All these values can also be changed individually, by the matching feedXmethods. If no constraints for the area and the distance are provided, the area default willbe from 150 to 210 (in half grades, i.e. an area of 30 grades in front of the laser), thedistance is not constrained.

RecObstacleCond This condition is fulfilled when a laser reading for a certain range andwithin a certain distance was perceived, i.e. if there seems to be an obstacle of what kind

75


ever in a certain direction. Laser readings, distance, and area can be given as parametersat initialization or changed later by the feedX methods. If no constraints for the area andthe distance are provided, the area default will be from 150 to 210 (in half grades, i.e. anarea of 30 grades in front of the laser) and the distance will default to 210mm.

A.1.4 Other Data Structures

Statede.dfki.cosy.robot.learn.State

The class State represents the perceived state of the environment and the success state of thelast action, if any. The relevant aspects of a state are accessible by the methodsgetLaserObject, which returns a RecObject, getVerbEval returning a short indi-cating the verbal evaluation, and getLastActionSuccess, also returning a short indicat-ing the success state of the previous action.

For being able to send state representations between OAA agents, we introduce a mappingto a string representation, the relevant translation methods are toRedString andreconstructFromRedStr.

Actionde.dfki.cosy.robot.learn.Action

The class Action is a wrapper around a Skill that the robot can execute. For example skillssee §A.1.1 above. Action is augmented with pre- and post-conditions for that Skill in orderto check if the Skill is applicable at all, and if the Skill was successfully executed. TheAction class also stores the information whether its last execution call was ’successful’ interms of having been executed with fulfilled postconditions.The relevant methods are:

execute: → voidstarts the execution of the skill

getSuccessState: → shortreturns the success state of the action

getStatus: → intindicates the execution state of the action

DispoObjde.dfki.cosy.robot.bdi.DispoObj

The class DispoObj represents a pair of a state s and an action a, for which the robot can collectexperience. Since in our context the state is identified by the type of object ot, that the robotperceives, (see §5.2.3), the components of a DispoObj are named object and action andencoded as strings. A mapping from unique object and action types, as used in this thesis is partof Constants in de.dfki.cosy.robot.util. DispoObj provides accessor methodsfor action and object, and can be represented as a string and reconstructed from a string.

76


ExperienceOAde.dfki.cosy.robot.bdi.ExperienceOA

The class ExperienceOA stores the experience for any amount of DispoObjs (i.e. pairs of astate s and an action a). It provides various accessor methods for storing new feedback, andretrieving the merged experience. Important methods are:

updateObjVE: DispoObj × float→ voidstores new verbal feedback for a DispoObj

updateObjSS: DispoObj × float→ voidstores new success state feedback for a DispoObj

getWeightedExp: DispoObj→ floatreturns the weighted experience for a DispoObj

A.1.5 Agents and Central Processing ClassesPerceivede.dfki.cosy.robot.learn.Perceive

The class PERCEIVE collects all relevant aspects of the environment to compose a State rep-resentation out of it. It does so either by listening to relevant messages sent out by OAA-agentsor by querying and interpreting the data given by the according devices (sensors) or queryingprocesses that access these devices. PERCEIVE is dependent on the specification of the State asit has to obtain and feed in all the relevant information.

The method for getting a State representation is perceive. It queries all relevant pro-cesses and returns the updated state comprising all relevant information. Currently this comprisesperceived objects by laser range finder and verbal evaluation of other agents. perceive withan argument of type Action additionally returns a state representation including the successstate of the action. As an agent, PERCEIVE offers the following solvables:

getObjects: returns the currently perceived objects (called by e.g. INTENTIONMODELAGENT)getPerception: returns a State representing current perceptions (called e.g. by PLAYAGENT)recordVerbEval: stores the verbal evaluations (called by the BDIMEDIATOR which interprets

the evaluative utterances and only passes their numerical interpretation (cf. Table 5.1,Equation 5.2 in §5.2.2).

PlayAgentde.dfki.cosy.robot.learn.PlayAgent

The class PLAYAGENT implements the general strategy for playful interaction given in §5.2.1. Itcontains references to the actions, triggers execution of these actions, and causes the responsesof the environment (=feedback) to be stored in an instance of ExperienceOA hold by theDISPOAGENT. According to this experience it chooses actions for in the perceived states. Thecentral method of the class is act, which triggers the acting and experiencing of the robotin the real world. In the act method, the method chooseAction is called, which decideswhich action to execute, then a reference to PERCEIVE provides the consequent states, from

77


which the feedback is extracted and stored in ExperienceOA hold by the DISPOAGENT (seebelow).3 Playing behavior is started and stopped externally via the solvables startPlaying andstopPlaying.

A.2 Communicative Actions

This section describes the relevant data structures and classes for dispositions for communicativeactions.

DispoAgentde.dfki.cosy.robot.bdi.DispoAgent

The DISPOAGENT is the central class for dealing with communicative actions. It has threeimportant functions. First it maintains an instance of ExperienceOA and provides accessi-bility to it by offering solvables to query (by solvable getXYDispo) and update (by solvablesetXYDispo) the experience values for given objects. Second it deals with the clarification ofambiguous requests, (by solvable interpretISA, see §A.3 below). Third it controls the issuingof help requests (by solvable engageOtherAgent, issued e.g. by ACTIONPLANNERAGENT)4)and the acquisition of a proper dialogue strategy. For the issuing of help requests it employs thefollowing data structures.

DialPolicyorg.cognitivesystems.learn.dial.DialPolicy

The class DialPolicy provides a data structure for representing possible courses of a dialogueand for storing the actually experienced courses. Dialogue courses are represented as a tree, withnodes indicating states of dialogues (DialState) and edges representing communicative ac-tions (DialAction) that lead from one state to the next, (see Figure 6.6 and §6.2.3). At eachstate the agent can choose one out of a set of actions. Experience (i.e. counts of observed state-action-state triples) allows it to estimate which state follows most probably after an action. Thisis actually done by an instance of ProbGraph in org.cognitivesystems.learn.dial,which stores triples and provides a set of count and probability methods on those triples. Ac-cording to costs of DialStates and DialActions, one can estimate the best action given a state.

3Note that the class ACTINPHYSENV provides the same functionality as PLAYAGENT with the difference that itholds an ExperienceOA of its own, and does not communicate with a DISPOAGENT. It is a stand alone testingversion of PLAYAGENT.

4The need of help can arise, if the ACTIONPLANNERAGENT (which is invoked either by BDI MEDIATOR

or as a consequence of adopting a goal in the INTENTIONMODELAGENT, either notices that the some pre-or postconditions of an action to be executed do not match, or that the action requires a human agent. TheACTIONPLANNERAGENT has access to an external planner (de.dfki.cosy.robot.planning), which re-quires an appropriate domain description. At the moment this is only implemented for simple manipulation com-mands like “Push the bucket to the left.”. For the integration of the external Fast Forward planning system(http://www.mpi-sb.mpg.de/∼hoffmann/ff.html) we are indebted to Michael Brenner from Albert-Ludwigs-Universitat Freiburg

78

A.3 BDI Model

The parameter memodepth indicates how many state-action-X-triples should be recorded. Ifthe memodepth+1st triple comes in, the oldest one is deleted.Important accessor methods are:

update: DialState × DialAction × DialState→ voidrecords a transition from one state to the next state via a dialogue action

init: DialState × List<DialAction> → voidintroduces a DialState and the possible DialActions in it

getBestNextAction: DialState→ DialActionreturns the best action

getRandomAction: DialState→ DialActionreturns one possible action randomly

getBestNextOrRandomAction: DialState × double→ DialActionreturns a random action with the probability of the double argument, else the best action

getLeastTriedAction: DialState→ DialActionreturns one of the actions that were executed the least

getNextState: DialState × DialAction→ DialStatereturns the most probable state that follows a preceding state and an action

setStateCost: DialState × int→ voidsets the cost of a DialState

setActionCost: DialState × DialAction × int→ voidsets the cost of a DialAction rooted in the given DialState

DialGraphElement, DialState, DialActionorg.cognitivesystems.learn.dial.*

DialPolicy is built up by DialStates and DialActions, simple objects, implementingthe interface DialGraphElement. Their important fields are identifier, description,and cost.

A.3 BDI Model

This section describes the implementation of the BDI model described in §4.5.

IntentionModelAgentde.dfki.cosy.robot.bdi

This agent contains the relevant processes for deriving goals, given perceptions and dispositions,and desires. The value of desires, currently only play and serve, is set when the agents is initial-ized, but they can be changed and queried with setDesireVal and getDesireVal. Theagent offers the following solvables:

79


processPotentialRequest: This solvable is requested by the BDIMediator in case it receivesan utterance from the communication system which could be interpreted as a request,(see §6.1). The agent stores the utterance in a local variable and triggers the interpretationprocess.

processDispoAgMsg: This solvable deals with answers of the DISPOAGENT that are expectedafter initiating clarification and help requests

The central processing method is calcGoals, which derives the goals according to themodel in §4.5. The method compares the values of the play- and serve-desire, in case the valueof the play-desire is higher, it returns a play-goal.

If the two values are equal it queries the PERCEIVE agent to get the current perceivableobjects (by sending the solvable getObjects) and in the second step it queries the DISPOAGENT

to find out about the experience regarding those objects. If the robot perceives any object withwhich it has positive, neutral or no experience at all, the method returns a play-goal. If the robotcannot perceive any objects or if all of the perceived are associated with negative experience, aserve-goal will be calculated. The same happens, if the value of the serve-desire is higher thanthe that of the play-desire.

The method triggers the calculation of a serve-goal based on a human’s utterance. (call ofgetUGoal) This invokes a call of the method interpretPotentialRequest with theinterpreted and classified last utterance of the human as the only argument. According to thedecision steps described in §6.1.2 and §6.1.3, (i.e. taking into the current situation and the typeof the utterance), the utterance is either directly interpreted as a request or clarification is needed.In the second case, a clarification request is triggered (calling clarifyAmbISA) by sendingthe solvable interpretISA, which is solved by the DISPOAGENT.

The DISPOAGENT maintains the experience with this type of utterance in its context, and ac-cording to the disposition (cf. §6.1.2) it either returns the appropriate interpretation or invokes theBDI MEDIATOR agent to clarify the intended meaning (with help of the communication subsys-tem). Any answer given back by the BDI MEDIATOR is stored as feedback and the interpretationis sent back to the DISPOAGENT by issuing the solvable processDispoAgMsg.5 Now accordingto this result, the method translateUttToGoal is called and translates the analyzed formof the utterance into a goal to adopt. In case a play-goal is adopted, the solvable startPlaying isissued, which causes the PLAYAGENT to start playful interaction with the environment.

5The reason why it cannot directly return the result as an answer to the original solvable interpretISA, is that theDISPOAGENT would be blocked until it receives the answer, which would undesirably block the BDI MEDIATOR, incase it requests services from DISPOAGENT.

80

Appendix B

Grammar specification

This appendix describes some exemplary extensions to the grammar necessary for covering theutterances described in the thesis.

B.1 Coverage

This section gives examples for the coverage of the grammar, relevant to the content of the thesis.

(18) Commandsa. Help me!b. Push the bucket!c. Push the bucket to the left!d. Move the bucket to the left!e. Bring me tea!f. Bring me the bucket to the lab!

(19) Questionsa. Can you help me?b. Would you bring me tea?c. Could you bring me tea please?d. Can you push the bucket?

(20) Assertionsa. I want tea.b. I want tea pleasec. I please want tead. I would like to have a tea.e. I push the bucket.

(21) Noun phrasesa. tea

81

B. Grammar specification

b. a tea

(22) Infinitives and Control Phenomena

a. I want to play.

b. I want you to bring me a coffee.

c. Do you want me to help you?

B.2 Categories

This section describes how the examples above are analyzed by giving the categories of thelexical items and some sample derivations.

B.2.1 Mass Nouns

Mass nouns like tea, coffee, or water for instance, do not need a determiner to build a nounphrase, as opposed to other nouns like e.g. ball.

(23) a. I want tea.

b. I want a tea.

c. *I want ball.

d. I want a ball.

In order to extend the grammar to correctly deal with mass nouns, we introduced an additionalword family np.mass for the lexical entries tea, coffee, and water. This family is specified ashaving the simple category np. The default category of all nouns is n with determiners like a orthe having category np/n. This specification gives a correct analysis of examples in (23)

B.2.2 The Modifier Please

An important use of the modifier please is to identify requests. Interrogative and indicative sen-tences containing a please mostly realize requests, (confer §6.1.2). Please can have differentpositions in the sentences, therefore it needs more than one category. Below, we will give cate-gories for please at the end of a sentence (24a), and in front of a verb (24b),(24c).1

(24) a. I want tea please.

b. I please want tea.

c. Can you please bring me a tea?

In the first case please modifies the sentence, its category is s\s, see (25). In the second case itmodifies the verbal phrase, hence we give it the category (s/np)/(s/np), see (26).

1I omit an analysis of please in sentence initial position as this was already part of the grammar.

82

B.2 Categories

(25) I want tea please

pper s\pper/np np s\s>

s\pper<s

<s

(26) I please want tea

pper s\pper/(s\pper) s\pper/np np>

s\pper>

s\pper<s

The semantics of please is expressed as a feature MODIFIER of the root nominal:@w1 (want & 〈Modifier〉please).

B.2.3 Transitive Verbs with different arguments

In addition to the transitive verbs already in the grammar, I introduced a range of further transi-tive verbs, with additional argument structures. The different word order for sentences in imper-ative mood, interrogative mood, and indicative mood, requires to introduce different categories.

Imperatives

This section gives an analysis of (18a) and (18f), repeated here in (27).

(27) a. Help me!b. Bring me the bucket to the lab!

The category of help in (27a) is s/pper. (28) gives the derivation:

(28) help me

s/pper pper>s

The semantics is given by the following logical formula, repeated from Logical Form 4.2 in§4.2.2 for convenience: As in most imperative sentences, the ACTOR is not realized on the

Logical Form B.1 Help me!

@h1:action(help ˆ <Mood>imp ˆ<Actor>(y1:hearer ˆ you) ˆ<Recipient>(i1:person ˆ I))

surface, but is implicitly given in the semantics of an imperative verb. The personal pronounme is the RECIPIENT.

83


sentence category of verbPush the bucket! s / npPush the bucket to the left! s / pp / npMove the bucket to the left! s / pp / npBring me tea! s / np / pperBring me the bucket to the lab! s / pp / np / pper

Table B.1: Exemplary mapping from verbs in imperative mood to category

(29) gives a derivation of (27b). The category of bring in this example is s/pp/np/pper, takingthree arguments from the right.

(29) bring me the bucket to the lab

s/pp/np/pper pper np/n n pp/np np/n n>

s/pp/np>np

>np>

s/pp>pp>s

Logical Form B.2 indicates the semantic role of the different arguments. The RECIPIENT of thebring-action is me, the PATIENT is the bucket, and the prepositional phrase to the lab gives thewhere to direction.

Logical Form B.2 Bring me the bucket to the lab!

@b1:action(bring ˆ<Mood>imp ˆ<Actor>(r1:hearer ˆ you) ˆ<Dir:WhereTo>(l1:location ˆ laboratory) ˆ<Patient>(b2:thing ˆ bucket) ˆ<Recipient>(i1:person ˆ I))

Other instances of bring and other verbs like push or move as in (18) differ in the amount ofarguments they take, see Table B.1.

Interrogatives

This section gives an analysis of exemplary verbs in interrogative mood. We only consider polarquestions, built with auxiliaries can, do, could, or would, leaving out factual questions. Thecategory of those auxiliaries is s/s, we type them as a state, and give them a relation SCOPE

with the predicate of the sentences as the dependent.

84

B.2 Categories

sentence category of verbCan you help me? s / sCan you help me? s \ pper / pperWould you bring me tea? s \ pper / np / pperCould you bring me tea please? s \ pper / np / pperCan you push the bucket? s \ pper \ np

Table B.2: Exemplary mapping from verbs in interrogative sentences to category

(30) can you help me

s/s pper s\pper/pper pper>

s\pper<s>s

(30) gives the derivation of (19a). Logical Form B.3 gives the semantic analysis. Table B.2

Logical Form B.3 Can you help me?

@c1:state(can ˆ<Mood>int ˆ<Scope>(h1:action ˆ help ˆ

<Mood>ind ˆ<Actor>(y1:person ˆ you) ˆ<Recipient>(i1:person ˆ I)))

presents the categories of other example verbs in interrogative sentences. Note that they all takea personal pronoun (pper) as argument to the left and additional arguments to the right.

Assertions

This section gives an exemplary analysis of verbs in indicative sentences. As with verbs ininterrogative sentences, the categories have in common that they take the subject argument to theleft and any object arguments to the right, reflecting the S-V-O word order in English indicativesentences. Table B.3 gives categories of verbs in exemplary indicative sentences.

Verbs like push and bring are semantically analyzed as having the type action, their sub-ject (I) is analyzed as the ACTOR, any direct objects (like bucket or tea) as PATIENT, indi-rect objects (like me) as RECIPIENT and a prepositional phrase (pp) indicating the direction asDIR:WHERETO, (cf. Logical Form B.2)

85


sentence category of verbI want tea. s \ pper / npI push the bucket. s \ pper / npI bring you tea. s \ pper / np / pperI like to have a tea. s \ pper / (sto \ pper)I would like to have a tea. s \ pper / (sinf \ pper)I would like to have a tea. sinf \ pper / (sto \ pper)I would like to have a tea. sto \ pper / (sinf \ pper)I would like to have a tea. sinf \ pper / np

Table B.3: Exemplary mapping from verbs in indicative mood to category

I would like to have a teapper s\pper/(sinf\pper) sinf\pper/(sto\pper) sto\pper/(sinf\pper) sinf\pper/np np/n n

>np>

sinf\pper>

sto\pper>

sinf\pper>

s\pper<s

Figure B.1: Derivation for “I would like to have a tea”

To-Infinitives

This section describes the analysis of sentences containing a to-infinitive. We follow (Steedman,2000), in analyzing the infinitival particle to as taking a bare infinitive as its argument and yield-ing a to-infinitive: (sto\pper)/(sinf\pper). Bare infinitives are analyzed as sinf\SUBJ(/OBJ)∗,with SUBJ indicating a subject np or pper to the left and (/OBJ)∗ indicating none, one, ormore object np or pper to the right.

The verb taking a to-infinitive as a complement has the category s\pper/(sto\pper) orsinf\pper/(sto\pper) depending on whether it is the predicate of the sentence or just an in-finitival complement to another predicate. Confer “I like to X” versus “I would like to X” inTable B.3, which also gives the category of “would”. (B.1) shows the derivation of (20d). Thesemantics of the verbs like and have are illustrated in the logical form for (20d), Logical FormB.4.

Object Control

Finally we introduce an example analysis for the object control verb want. Object control is thephenomenon where the referent of the object argument of the control verb (want) is identicalto the referent of the subject of the embedded verb predicate. The category of want as in (22c):“Do you want me to help you?” is s\pper/(sto\pper)/pper, taking a personal noun phrase

86

B.2 Categories

Logical Form B.4 I would like to have a tea.

@w1:state(would ˆ<Mood>ind<Scope>(l1:emotive-mental-process ˆ like ˆ<Phenomenon>(h1:state ˆ have ˆ<Owner>i1:person ˆ<Possession>(t1:thing ˆ tea) ˆ

<Senser>(i1:person ˆ I)))

do you want me to help you

s/s pper s\pper/(sto\pper)/pper pper sto\pper/(sinf\pper) sinf\pper/pper pper>

s\pper/(sto\pper)>

sinf\pper>

sto\pper>

s\pper<s>s

Figure B.2: Derivation for “Do you want me to help you?”

pper (the subject) to its left, and another pper (the object) to its right, as well as a to-infinitivesto\pper. Figure B.2 gives the complete derivation. In order to yield an appropriate semantics,we need to ensure that the referent of the ACTOR of the help-action is co-referent to the PATIENT

of the want-predicate. Following (Halliday and Christian, 1994), we type the want-predicate asa desiderate-mental-process and analyze its subject as the SENSER and the help-clause comple-ment as the PHENOMENON. See Logical Form B.5 for the meaning of the whole sentence.

Logical Form B.5 Do you want me to help you?

@d1:state(do ˆ<Mood>int ˆ<Scope>(w1:desiderate-mental-process ˆ want ˆ<Patient>(i1:person ˆ I) ˆ<Phenomenon>(h1:action ˆ help ˆ<Actor>i1:person ˆ<Recipient>(y1:person ˆ you)) ˆ

<Senser>(y2:person ˆ you)))

87

dispositions for sociable robotssw/publications/diplom_final... · 2007-09-11 · dispositions for...

Documents