Upload
lazar
View
52
Download
0
Tags:
Embed Size (px)
DESCRIPTION
eNTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation. Overview. Context: Exploitation of multi-modal signals for the development of an active robot/agent listener Storytelling experience : - PowerPoint PPT Presentation
Citation preview
eNTERFACE’08Multimodal Communication with
Robots and Virtual Agentsmid-term presentation
OverviewContext:
• Exploitation of multi-modal signals for the development of an active robot/agent listener
• Storytelling experience :– Speakers told a story of an animated cartoon they had just seen
1- See the cartoon
2- Tell the story to a robot or an agent
OverviewActive listening :
– During natural interaction, speakers see if the statements have been correctly understood (or at least heard).
– Robots/agents should also have active listening skills…
• Characterization of multi-modal signals as inputs of the feedback model:– Speech analysis : prosody, keywords recognition, pauses– Partner analysis : face tracking, smile detection
• Robot/agent feedbacks (outputs):– Lexical non-verbal behaviors
• Feedback model:– Exploitation of both inputs and outputs signals
• Evaluation:– Storytelling experiences are usually evaluated by annotation
• Audio visual recordings of a storytelling between a speaker and a listener.
• 22 storytelling sessions telling the “Tweety and Sylvester - Canary row” cartoon story.
• Several conditions (speaker and listener): same language, different.
• Languages: Arabic, French, Turkish and Slovak
• Annotation oriented to interaction analysis:– Smile, Head nod, shake, Eye brow, Acoustic prominence
QuickTime™ et undécompresseur Vidéo 1 Microsoft
sont requis pour visionner cette image.
Architecture of an interaction feedback model
Multi-modal featureextraction
Feedbackstrategy
Multi-modal feedback
Multi-modal feature extractionKey idea:
Extraction of features annotated from the STEAD corpus:
• Face processing: Head nod, shake, smile, activity.
• Keyword spotting: keywords have been defined in order to switch the agent’s state.
• Speech Processing: Acoustic Prominence detection
QuickTime™ et undécompresseur Vidéo 1 Microsoft
sont requis pour visionner cette image.
Multi-modal feature extractionKeyword spotting: keywords have been defined in order to switch the agent’s
state.
ASR
Agent’s state manager (ASM)
Multi-modal feature extractionAcoustic Prominence Detection:
• Prosody analysis in real-time by using Pure Data:– Development of different Pure Data objects (written in C):
• Voice Activity Detection
• Pitch and Energy extraction
• Detection: – Statistical model (Gaussian assumption):
• Kullback-Leibler similarity
Feedback model• Extraction of rules from the annotations (STEAD corpus):
– Rules are defined in the literature– Application to our specific task
• When a feedback is triggered?
• Feedback behaviours:– ECA : Several behaviours are already defined (head movements, face
expressions) for GRETA with BML (Behaviour Markup Language).
– ROBOT: We defined several basic behaviours for our AIBO robot (inspired from dog’s reactions): Mapping from BML and robot movements.
Future works• Integration:
– Real-time Multi-modal Feature Extraction:• Prominence detection object (Pure Data)• Communication between the modules by PsyClone
– Already done for Video processing.
– Tests of Feedback Behaviours for AIBO– Agent’s state modifications
• Recordings and annotations of storytelling experiences with both GRETA and AIBO.
Thank for your attention…