Upload
le-anh
View
547
Download
1
Tags:
Embed Size (px)
DESCRIPTION
In this slides, we present a common gesture speech framework for both virtual agents like ECA, IVA, VH and physical agents like humanoid robots. This framework is designed for different embodiments so that its processus are independent from a specific agent.
Citation preview
A Common Gesture and Speech Production Framework for Virtual and Physical Agents
Workshop on Speech and Gesture Production, ICMI 2012, Santa Monica, CA, USA
Quoc Anh Le - Jing Huang - Catherine PelachaudCNRS, LTCI
Telecom-ParisTech, France
Introduction
Motivations• Similar approaches between virtual agents and humanoid
robots• Limits of existing systems: agent dependent
Objectives• Common co-verbal gesture generation framework for both
virtual and physical agents Methodologies• Based on GRETA system• Use
- same representation languages - same algorithm for selecting and planning gestures - different algorithms for creating the animation
page 2
page 3
Behavior Realizer
(Common Module)
Intent Lexicon Behavior Lexicon
Behavior Planner
(Common Module)
FAP-BAP Values
Joint Values
ActiveMQ
Messaging Central System
FML-APML BML BML Keyframes
Animation Realizer
(Specific Module)
Animation Realizer
(Specific Module)
Keyframes Keyframes
Greta Animation Lexicon
Nao Animation Lexicon
Input Data (text, audio, video, etc)
Intent Planner
(Common Module)
FML-APML
Baselines for Nao
Baselines for Greta
Gestuary for Nao
Gestuary for Greta
FAP-BAP Player
Nao Built-in Proprietary Procedures
Architecture Overview
page 4
Behavior Realizer
(Common Module)
Intent Lexicon Behavior Lexicon
Behavior Planner
(Common Module)
FAP-BAP Values
Joint Values
FML-APML BML BML Keyframes
Animation Realizer
(Specific Module)
Animation Realizer
(Specific Module)
Keyframes Keyframes
Greta Animation Lexicon
Nao Animation Lexicon
Input Data (text, audio, video, etc)
Intent Planner
(Common Module)
FML-APML
Baselines for Nao
Baselines for Greta
Gestuary for Nao
Gestuary for Greta
FAP-BAP Player
Nao Built-in Proprietary Procedures
Behavior Realizer
Behavior Lexicon
Behavior Realizer: Outline
page 5
Common processes to all agents1. Create gesture from the gestuary of an agent
2. Schedule timing of gesture phases
3. Generate keyframes: pair (absolute time, symbolic description of hand configuration at this time)
Different databases For Nao
Gestuary (for instance, pointing with full stretch arm) Velocity profile (empirically determined from Nao)
For Greta Gestuary (for instance, pointing with one finger) Velocity profile (empirically determined from real humans)
page 6
Example: Different pointing gestures
page 6
<bml id=“bml1” > <speech xmlns="" id="s1" start="0"> <text>It is <sync id=« tm1 »/> overthere! <sync id=« tm2 »/> </speech> <gesture id=« g1 » lexeme=« pointing » start=«s1:tm1» end=«s2:tm2»> <description priority=« 1 » type=«GRETA»> <GRETA:SPC>0.80</GRETA:SPC>
<GRETA:TMP>0.50</GRETA:TMP> <GRETA:FLD>-0.62</GRETA:FLD> <GRETA:PWR>0.30</GRETA:PWR> <GRETA:REP>0.00</GRETA:REP> <GRETA:OPE>1.00</GRETA:OPE> <GRETA:TEN>0.20</GRETA:TEN>
</description> </gesture></bml>
Nao Gestuary..<gesture id=« pointing »><phase type=« stroke »> <vertical>YUpperP</vertical> <horizontal>XEP</horizontal> <distance>XFar<distance> <hShape>OPEN</hShape> </phase></gestures>…
BML
<keyframe 1 (time, description)><keyframe 2 (time, description)>…<keyframe N (time, description)>
<keyframe 1 (time, description)><keyframe 2 (time, description)>…<keyframe N (time, description)>
JOINT VALUES BAP
Greta Gestuary..<gesture id=« pointing »><phase type=« stroke »> <vertical>YP</vertical> <horizontal>XP</horizontal> <distance>XMiddle<distance> <hShape>INDEX</hShape> </phase></gestures>…
1 1
2, 3 2,3
4 4
page 7
BR: Synchronization with speech
Algorithm• Compute preparation phase• Do not perform gesture if not enough time (strokeEnd(i-1) > strokeStart(i)
+duration)• Add a hold phase to fit gesture planned duration• Co-articulation between several gestures
- If enough time, retraction phase (ie go back to rest position)
- Otherwise, go from end of stroke to preparation phase of next gesture
Start
S-start S-end S-start S-end
end
Start end Start end
BR: Velocity profiles
page 8
Gesture velocity • Predict a movement duration using Fitts’ law:
• Movement Time = a+b*log2(Distance+1)• Threshold of maximal speeds (empirically determined)• Stroke phase is different from other phases in velocity and
acceleration (Quek, 1995)
Add expressivity• Temportal extent (TMP): Modulate the duration of whole gesture
=> change coefficient of Fitts’ Law
BR: Build coefficients of Fitts’ law
page 9
page 10
Behavior Realizer
(Common Module)
Intent Lexicon Behavior Lexicon
Behavior Planner
(Common Module)
FAP-BAP Values
Joint Values
FML-APML BML BML Keyframes
Animation Realizer
(Specific Module)
Animation Realizer
(Specific Module)
Keyframes Keyframes
Greta Animation Lexicon
Nao Animation Lexicon
Input Data (text, audio, video, etc)
Intent Planner
(Common Module)
FML-APML
Baselines for Nao
Baselines for Greta
Gestuary for Nao
Gestuary for Greta
Animation Realizer
Implemented expressivity parameters
page 11
EXP Definition Nao Greta
TMP Velocity of movement Change coefficient of Fitts’ law
Change coefficient of Fitts’ law
SPC Amplitude of movement Limited in predefined key positions
Change gesture space scales
PWR Acceleration of movement
Modulate stroke duration Modulate stroke acceleration
REP Number of stroke repetition times
Yes Yes
FLD Smoothness and Continuity
No No
OPN Relative spatial extent to body
No elbow swivel angle
TEN Muscular tension No No
Create animation parameters Joint values for Nao BAP values for Greta
page 12
Create animation parameters
Descritization of the gestural space of McNeill (1992) One symbolic position will be translated into concrete values of agent joints (for
instance 6 joints of Nao as table below)
Translate symbolic keyframes in joint values Animation is obtained by interpolating between
joint values with robot built-in proprietary procedures use Slerp (spherical linear interpolation) with time warping: easing in out
functionsfor Greta
Code ArmX ArmY ArmZ Joint values (LShoulderPitch, LShoulderRoll, LElbowYaw, LElbowRoll, LWristYaw, Hand)
000 XEP YUpperEP ZNear (-54.4953, 22.4979, -79.0171, -5.53477, -0.00240423, 1.0)
001 XEP YUpperEP ZMiddle (-65.5696, 22.0584, -78.7534, -8.52309, -0.178188, 1.0)
002 XEP YUpperEP ZFar (-79.2807, 22.0584, -78.6655,-8.4352, -0.178188, 1.0)
010 XEP YUpperP ZNear (-21.0964, 24.2557, -79.4565, -26.8046, 0.261271, 1.0)
... ... ... ... ...
page 13
Greta: Full Body IK
Torso target depending on hand position
Torso IK
Analytic Method: Arm To Torso
Demo: Greta
page 14
Demo: Nao
page 15
Perceptive Evaluation
Objective• Evaluate how robot’s gestures are perceived by human users
Procedure• Participants (63 French speakers) rate videos of Nao storyteller• Random displayed versions to the participants: - Gestures with expressivity VS. Gestures without expressivity
- Gesture-speech synchronization VS. Gesture-speech asynchronization
Results (using the ANOVA method)
• Synchronization: - F(1, 124) = 4.94, p < .05
- 76% agreed that gestures were synchronized with speech for sync version
• Expressivity:- F(1, 124) = 4.43, p < .05
- 70% agreed that gestures were expressive for expressivity version
page 16
State of the art
Most similar work: Salem et al. (2012) • Same idea (based on existing Max virtual agent system)
Main differences:• Our system: re-designed GRETA as a common framework• Salem et al.’s system: adjusted Max’s ACE to ASIMO robot
page 17
Features Our model Salem et al.’s system
Gesture Product Online from templates regardless specific domain
Automatically generated from trained specified domain data corpus
Gesture Shapes Agent specific parameter Original for Max and mapped to ASIMO configurations
Gesture Timing Agent specific parameter Original for Max and adapted to ASIMO by feedback
Expressivity Yes No
Synchronization Adapt gesture to speech Cross-Modal Adjustment
Future works
Short-term plan• Human like gestures: enhance velocity profiles• Expressivity: implement fluidity and tension
Long-term plan• Feedback mechanism • Study of the coherence between consecutive
gestures in a G-Unit (Kendon, 2004)
page 18