Conversational role assignment problem in multi-party dialogues

Conversational role assignment problem in multi-party dialogues

Natasa Jovanovic Dennis Reidsma Natasa Jovanovic Dennis Reidsma

Rutger RienksRutger Rienks

TKI groupTKI group

University of TwenteUniversity of Twente

Outline

Research tasks at TKIResearch tasks at TKI Interpretation of multimodal human-Interpretation of multimodal human-

human communication in the meetingshuman communication in the meetings Conversational Role Assignment Problem Conversational Role Assignment Problem

((CRAPCRAP)) Towards automatic addressee detectionTowards automatic addressee detection

A framework for multimodal interaction research

Layered annotation

Unannotated corpus (videoand audio recordings in a

certain domain)

Annotation of events indifferent modalities (E.g.gaze, posture, gesture,

speech)

Multimodal interpretation ofevents in terms of semantic

models.

Tools (e.g. for retrieval,simulation, remote

presence, generation ofminutes for meetings)

Research on humanbehaviour

Models and theories ofinteraction; semantics of

annotation schemes

I

IIA

IIB III

IV V

II

Multimodal annotation tool

Who is talking to whom?

CRAP as one of the main issues in multi- CRAP as one of the main issues in multi- parity conversation (Traum 2003.)parity conversation (Traum 2003.)

Taxonomy of conversational roles (Herbert Taxonomy of conversational roles (Herbert K. Clark)K. Clark)

speaker addressee side participant

all participantsbystander

eavesdropperall listener

Our goal:Our goal: Automatic addressee identification in Automatic addressee identification in

small group discussions small group discussions Addressees in meeting conversations: Addressees in meeting conversations:

single participant, group of people, whole single participant, group of people, whole audienceaudience

Importance of the issue of addressing in Importance of the issue of addressing in multi-party dialoguesmulti-party dialogues

Addressing mechanisms

What are relevant sources of information for What are relevant sources of information for addressee identification in the face-to-face addressee identification in the face-to-face meeting conversations?meeting conversations?

How does the speaker express who is the How does the speaker express who is the addressee of his utterance?addressee of his utterance?

How can we combine all this information in order How can we combine all this information in order to determine the addressee of the utterance?to determine the addressee of the utterance?

Sources of information

SpeechSpeech Linguistic markers Linguistic markers

word classesword classes: personal pronouns, determiners in combination : personal pronouns, determiners in combination with personal pronouns, possessive pronouns and adjectives, with personal pronouns, possessive pronouns and adjectives, indefinite pronouns, etc.indefinite pronouns, etc.

Name detection ( vocatives)Name detection ( vocatives) Dialogue actsDialogue acts

Gaze directionGaze direction Pointing gesturesPointing gestures Context categories(features)Context categories(features)

Dialogue Acts and Addressee detection (I) How many addresses may have an utterance?How many addresses may have an utterance? According to dialog act theory an utterance or an According to dialog act theory an utterance or an

utterance segment may have more than one utterance segment may have more than one conversational function.conversational function.

Each DA has a addressee Each DA has a addressee ==> an utterance may ==> an utterance may have several addresses have several addresses

Dialogue Acts and Addressee detection (II) MRDA (Meeting Recorder Dialogue Acts)– tag MRDA (Meeting Recorder Dialogue Acts)– tag

set for labeling multiparty face to face meetings set for labeling multiparty face to face meetings (ICSI)(ICSI)

We use a huge subset of the MRDA set which is We use a huge subset of the MRDA set which is organized on two levels:organized on two levels: Forward looking functions (FLF )Forward looking functions (FLF ) Backward looking functions (BLF)Backward looking functions (BLF)

Non-verbal features

GazeGaze Contribution of the gaze to the addressee detection is Contribution of the gaze to the addressee detection is

dependent on: participants’ location (visible area), dependent on: participants’ location (visible area), utterance length, current meeting actionutterance length, current meeting action

Turn-taking behavior and addressing behaviorTurn-taking behavior and addressing behavior

Gesture ( pointing at a person)Gesture ( pointing at a person) TALK_TO (X,Y) AND POINT_TO (X,Y)TALK_TO (X,Y) AND POINT_TO (X,Y) TALK_TO( X,Y) AND POINT_TO (X,Z) – X talk to Y about ZTALK_TO( X,Y) AND POINT_TO (X,Z) – X talk to Y about Z

Context categories

Bunt: “totality of conditions that may influence Bunt: “totality of conditions that may influence understanding and generation of communicative behavior”understanding and generation of communicative behavior” Local contextLocal context is an aspect of context that can be is an aspect of context that can be

changed through communicationchanged through communication Context categories:Context categories:

Interaction history ( verbal and non-verbal)Interaction history ( verbal and non-verbal) Meeting action historyMeeting action history Spatial context (participants’ location, distance, visible Spatial context (participants’ location, distance, visible

area, etc. )area, etc. ) User context (name, gender, roles, etc. )User context (name, gender, roles, etc. )

Towards an automatic addressee detection Manual or automatic features annotation?Manual or automatic features annotation? An automatic target interpreter has to deal An automatic target interpreter has to deal

with uncertaintywith uncertainty Methods:Methods:

Rule-based methodRule-based method Statistical method ( Bayesian networks)Statistical method ( Bayesian networks)

Rule-based method

1.1. Processing information obtained from the utterance Processing information obtained from the utterance ( linguistic markers, vocatives, DA). The result is a list ( linguistic markers, vocatives, DA). The result is a list of possible addressees with corresponding probabilitiesof possible addressees with corresponding probabilities

1.1. Eliminate cases where target is completely Eliminate cases where target is completely determined (for instance, name in vocative form)determined (for instance, name in vocative form)

2.2. Set of rules for BLFSet of rules for BLF

3.3. Set of rules for FLFSet of rules for FLF

2.2. Processing gaze and gesture information adding the Processing gaze and gesture information adding the additional probability values to the candidates additional probability values to the candidates

Meeting actions and addressee detection Automatic addressee detection method can Automatic addressee detection method can

be applied to the whole meetingbe applied to the whole meeting Knowledge about the current meeting Knowledge about the current meeting

action as well as about meeting actions action as well as about meeting actions history may help to better recognize the history may help to better recognize the addressee of a dialogue act.addressee of a dialogue act.

Future works

Development of multimodal annotation toolDevelopment of multimodal annotation tool Data annotation forData annotation for

training and evaluating statistical models training and evaluating statistical models obtaining inputs for rule-based methodsobtaining inputs for rule-based methods

New meeting scenarios for research in New meeting scenarios for research in addressing addressing

Documents

Conversational role assignment problem in multi-party dialogues