Upload
gervase-adams
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
C CMo i
IST project COMIC
Vision and Approach
Results first 1.5 years
http://www.hcrc.ed.ac.uk/comic/
C CMo i
Vision of COMIC
• Multimodal interaction will only be accepted by non-expert users if fundamental cognitive interaction capabilities of human beings are properly taken into account
C CMo i
Approach of COMIC
• Obtain fundamental knowledge on multimodal interaction – use of speech, pen, and facial expressions
C CMo i
Approach (2)
• Develop new approaches for component technologies that are guided by human factor experiments
C CMo i
Approach (3)• Obtain hands-on experience by building an
integrated multimodal demonstrator for bathroom design that combines new approaches for:– Automatic speech recognition– Automatic pen gesture recognition– Fusion– Dialogue and Action management– Fission– Output generation combining text and speech and
facial expression– System integration– Cognitive knowledge
C CMo i
The partners of COMIC• Max Planck Institute for Psycholinguistics –
Fundamental Cognitive Research• Max Planck Institute for Biological
Cybernetics – Fundamental Cognitive Research • University of Nijmegen – ASR and AGR• University of Sheffield – Dialogue and Action• University of Edinburgh – Fission and Output• DFKI – Fusion and System Integration• ViSoft – Graphical part of Demonstrator
C CMo i
This presentation
• Explanation of the demonstrator
• Results of fundamental cognitive research – Multimodal Interaction– Facial Expressions
• Results of Human Factor Experiments
C CMo i
The COMIC demonstrator• Bathroom design for non-expert users• Final goal is to implement 4 phases:
– 1: Input shape and dimensions of own bathroom (pen and speech input)
– 2: Choose position of sanitary ware (based on templates)
– 3: Conversational dialogue about types of sanitary ware and tiles
– 4: 3D view of negotiated bathroom
• Result is taken to expert salesman who will proceed from there.
C CMo i
The COMIC demonstrator
• Three versions– T12: Proof of technical integration of all
modules– T24: Limited functionality – fixed bathroom/
only tiles– T36: Full functionality – own bathroom,
sanitary ware, tiles
C CMo i
The SLOT Research Platform
• Recording dyadic, natural interactions
• Route negotiation task with road maps
• Use of electronic pen/ink for drawing routes
• Elaborate and theory-free coding of data
• Systematically manipulating available modalities (drawing, visual contact)
C CMo i
Results Quantitative analysis of turn-taking behavior
4x4 dyads; 6 hours annotated interaction• Normally, there is no delay between people’s
turns• With one-way mirror, the “blind” person is
slower to take up her turn• This leads to longer silent periods (pauses)• … which leads to significantly slower
communication
C CMo i
Possible relevance for HCI
In conversational HCI with talking head:• User sees computer’s “face”• User might assume that the computer sees
his face• Speech recognition has a hard time reliably
detecting end-of-speech acoustically
Therefore we hypothesize that • User will notice (even more) that the
computer responds very slowly
nn:nn:nn:nn:
C CMo i
Fundamental Research on Facial Expressions
• Faces do a lot in a conversation • Lip motion for speaking• Emotional Expression (pleasure, surprise, fear)• Dialog flow (back-channeling: confusion, comprehension, agreement)• Co-expression (emphasis and word/topic stress)
• Most work on Avatars focuses exclusively on lip motion for speech.
• We aim to broaden the capabilities of Avatars, allowing for more sophisticated self expression and more subtle dialog control.
• To this end, we use psychophysical knowledge and procedures as a basis for synthesizing human conversational expressions.
C CMo i
First step: Real expressions• We recorded variety of conversational expressions from
several individuals. We then experimentally determined how identifiable and believable those expressions were.
• In general, we found that:– The expressions were easily recognized -- even in the complete absence of
conversation context! (and thus can be useful for back-channeling).
– The pattern of confusions between expressions indicate potential trouble areas (e.g., thinking was often mistaken for confusion!)
– These (“enacted”) expressions were not always recognized or found to be completely sincere (speech might help here).
C CMo i
Next step: What moves when? We are now performing a fine-grained analysis of the necessary and sufficient
components of conversational facial motion. What must move when to produce an identifiable and believable expression?
C CMo i
Relevance for HCI and eCommerce• Psychophysical studies of real expressions offer strong
insights into how one can produce identifiable, realistic, and believable conversational expressions.
• The expansion of Avatars’ expressive capabilities promises to improve the ease of use of HCI systems.
C CMo i
Human Factors Experiments guiding the technology
• University of Nijmegen investigated Input issues (ASR, AGR, Fusion)
• University of Edinburgh investigated output issues (Text, Graphics, and Face, Fission)
C CMo i
Human Factors Experiments
• Exploratory pilot studies– Can users combine pen and speech for entering
data about the layout of a room?– Do they like it, what do they prefer?– System-driven vs mixed-initiative dialogs– Pen+speech data acquisition and analysis
C CMo i
HF Experiments input
• Task: study a blueprint and specify this using speech and/or pen
• Subjects had to specify position + lengths of walls, doors, windows, sanitary ware
• Experiment is directly related to phase 1 of demonstrator
C CMo i
HF Experiments main results
• Subjects prefer gestures and speech, or gestures only; speech only is not preferred
• Subjects show a large variation in behaviour even when restricted to narrowly defined tasks
• Subjects prefer mixed-initiative dialogue• System-driven results in fewer errors, but requires more time
C CMo i
HF Experimentsspeech
• Subjects use three types of speech comments– Within task
“here is a wall with width 3 meter 40 …”– Out-of-task, within dialogue
“now I am going to draw the next wall”– Out-of-dialogue
“I hope I'm drawing this in the right way ..”
C CMo i
Human Factors ExperimentsOutput
• Fission module: translates abstract dialogue acts into specifications for output channels
• Goal: model the choices made in the COMIC fission module after naturally-occurring interactions.
• Question: What are important natural actions in multimodal dialogue?
C CMo i
Human Factors ExperimentsOutput
• “Wizard of Oz” recordings
• Set up of the recordings: – Subjects (native English speakers, not
bathroom design experts) played the role of a bathroom sales consultant presenting a range of options to the client.
– Total recordings: 7 interactions; approximately 2.5 hours of video.
C CMo i
Human Factors ExperimentsOutput
• Making use of the recordings:Annotation
• Focus on scenes where the “consultant” says things similar to the planned system output– In particular, descriptions and comparisons of
options
• Mark up surface features like those under control of the fission module, and factors predicted to have an effect on those features
C CMo i
Making use of the recordings: Using the results
• Examine:– Range of surface features, deictic gestures,
prosody, facial expressions and gaze; both occurrence and timing
– Correlation between features and factors such as description vs comparison, first vs repeated mention, positive vs negative context
• Use these results in the development of the Fission module
C CMo i
Sample comparison“So they give you a degree of colour, they’re slight– they’re obviously slightly busier than looking at something like this, but, umm, they’re not quite as intense as having a whole block of colour, such as those two.”