eNTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation

eNTERFACE’08Multimodal Communication with

Robots and Virtual Agentsmid-term presentation

OverviewContext:

• Exploitation of multi-modal signals for the development of an active robot/agent listener

• Storytelling experience :– Speakers told a story of an animated cartoon they had just seen

1- See the cartoon

2- Tell the story to a robot or an agent

OverviewActive listening :

– During natural interaction, speakers see if the statements have been correctly understood (or at least heard).

– Robots/agents should also have active listening skills…

• Characterization of multi-modal signals as inputs of the feedback model:– Speech analysis : prosody, keywords recognition, pauses– Partner analysis : face tracking, smile detection

• Robot/agent feedbacks (outputs):– Lexical non-verbal behaviors

• Feedback model:– Exploitation of both inputs and outputs signals

• Evaluation:– Storytelling experiences are usually evaluated by annotation

• Audio visual recordings of a storytelling between a speaker and a listener.

• 22 storytelling sessions telling the “Tweety and Sylvester - Canary row” cartoon story.

• Several conditions (speaker and listener): same language, different.

• Languages: Arabic, French, Turkish and Slovak

• Annotation oriented to interaction analysis:– Smile, Head nod, shake, Eye brow, Acoustic prominence

QuickTime™ et undécompresseur Vidéo 1 Microsoft

sont requis pour visionner cette image.

Architecture of an interaction feedback model

Multi-modal featureextraction

Feedbackstrategy

Multi-modal feedback

Multi-modal feature extractionKey idea:

Extraction of features annotated from the STEAD corpus:

• Face processing: Head nod, shake, smile, activity.

• Keyword spotting: keywords have been defined in order to switch the agent’s state.

• Speech Processing: Acoustic Prominence detection

QuickTime™ et undécompresseur Vidéo 1 Microsoft

sont requis pour visionner cette image.

Multi-modal feature extractionKeyword spotting: keywords have been defined in order to switch the agent’s

state.

ASR

Agent’s state manager (ASM)

Multi-modal feature extractionAcoustic Prominence Detection:

• Prosody analysis in real-time by using Pure Data:– Development of different Pure Data objects (written in C):

• Voice Activity Detection

• Pitch and Energy extraction

• Detection: – Statistical model (Gaussian assumption):

• Kullback-Leibler similarity

Feedback model• Extraction of rules from the annotations (STEAD corpus):

– Rules are defined in the literature– Application to our specific task

• When a feedback is triggered?

• Feedback behaviours:– ECA : Several behaviours are already defined (head movements, face

expressions) for GRETA with BML (Behaviour Markup Language).

– ROBOT: We defined several basic behaviours for our AIBO robot (inspired from dog’s reactions): Mapping from BML and robot movements.

Future works• Integration:

– Real-time Multi-modal Feature Extraction:• Prominence detection object (Pure Data)• Communication between the modules by PsyClone

– Already done for Video processing.

– Tests of Feedback Behaviours for AIBO– Agent’s state modifications

• Recordings and annotations of storytelling experiences with both GRETA and AIBO.

Thank for your attention…

Documents

eNTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation