Upload
tyson
View
31
Download
2
Tags:
Embed Size (px)
DESCRIPTION
20th International Symposium 2006. Evaluation of a multimodal Virtual Personal Assistant Glória Branco. Sophie-Antipolis, March 23, 2006. Agenda. Introduction FASiL project and consortium The Virtual Personal Assistant (VPA) Architecture Functionalities Interface - PowerPoint PPT Presentation
Citation preview
1ª Reunião Acompanhamento - BRIDGE1
Evaluation of a multimodal Virtual Personal Assistant
Glória Branco
Sophie-Antipolis, March 23, 2006
20th International Symposium 2006
2
Agenda
• Introduction– FASiL project and consortium– The Virtual Personal Assistant (VPA)
• Architecture• Functionalities• Interface
– Global Evaluation Methodology• Heuristic Evaluation• User Trials
• The Portuguese trials– Method– Results– Users comments
• Conclusions
3
FASiL Project
• FASiL – “Flexible and Adaptive Multi-Modal Spoken Interface Language” – EU-IST funded, multimodal, multi-lingual, conversational
application to e-mail management.
• Objectives– “...to pilot a full multi-modal voice portal application that is
3G mobile network ready, along with tools for rapid development of new applications. FASiL targets the languages of UK English, Portuguese and Swedish… [with] intelligent, friendly adaptive multi-modal interaction.”
4
FASiL Consortium
generation
Inovação
P TInovação
FASiL: “Flexible and Adaptive Multi-Modal Spoken Interface Language”
5
VPA Architecture
ASR Multilingual
TTS
Vox Generator Services
Fission
Mid
dle
we
re
PIM
Administ
rtion
Multi-Modal Gateway
Fusion
Dialogue Manager
GUI Gateway
6
VPA Funcionalities
• Hear a summary of the Inbox.• Navigation: next, previous.• Select specific e-mails : search by
State (new, old), Sender, Date, Priority and Category.
• Read, compose, reply, forward and delete e-mails.
• Recipient list management.• Summarisation.• Categorisation.
7
VPA Interface
• Output – Voice– Avatar– Screen– PDA
• Input – Voice– Keyboard– Mouse– Touch– Stylus
Multimodal
VUI
GUI
Available in English, Swedish and Portuguese
8
Global Evaluation
• Set up of test environment– Task design, to cover the VPA functionalities. – Test mailbox populated with a restricted set of contacts and
emails.
• Heuristic Evaluation– 5 expert assessments by each language. – Experts in accessibility, usability and voice interaction.
• User Tests– 20 users for accessibility only for the English version (RNIB and
RNID)– 20 Swedish and English users and 12 Portuguese. – Experts in email usage.
“to iteratively gather information about the usability and accessibility of the system”
9
The Portuguese Trial• Laboratory environment
– the graphic interface was a web-based page, simulating a mobile phone. The users used a desktop PC with Internet access to interact with the GUI and a fixed phone to convey voice to the system.
• 12 native Portuguese speakers – 8 males and 4 females – from 19 to 46 years (mean 30,6 years) – 75% of the participants had high-level education and 16,7 % had mid-
level education – ICT domain professionals and experienced e-mail users.
• 5 typical e-mail tasks– login and browsing mailbox– search for and reply to an e-mail– search and forward– administer and manage the recipient list – finding, reply and deleting an e-mail
10
Task results summary
Task
Time Comp. %
VUI GUI. Correct resp.
No resp.
Misund. Incor. resp
T1 10(7-18)
83,3 14,0 (9-49)
6,5 (0-22)
54,3 13,7 6,8 25,2
T2 7(5-12)
75 11,5 (5-33)
4,5(0-16)
60,2 11,4 8,5 19,9
T3 5(3-16)
100 14,5(1-26)
1,5(0-20)
67 9,8 9,8 13,3
T4 6(2-10)
50 16;5(4-41)
2(0-6)
55,84 7,4 18,6 18,2
T5 10(4-18)
75 32(9-66)
4(0-11)
59,4 9,4 9,4 21,8
Interactions Spoken interaction (%)
11
Post-test satisfaction questionnaire
0
1
2
3
4
5
6
7
8
Intu
itiven
ess
Easy
Confid
ence
Satisf
actio
n
whe
re&abo
uts
erro
reco
g
prom
pts
emai
ls
Conve
rsat
ions
Very Sat Satisf ied Neutral Unsatisf ied Very Unsat.
Frequeci
e
12
Statistical analysis
• Significant correlation (Spearman’s correlation coefficient) between the overall satisfaction and: – Quality of dialog: = 0,87 – Confidence: = 0,79– Easy of use: = 0,74– Interaction control: = 0,73 – Interaction quality (error recognition): = 0,69
• Significant correlation (Spearman’s correlation coefficient) between the overall satisfaction (subjective) and the concept accuracy (objective value of correct responses): = 0,85.
• No differences between females and males (Mann-Whitney test) as well as between the experimented or naïve users.
13
Users aproach
• The preferred modality was speech.• Natural language, using short phrases
but with complex commands.• Speech input to convey the
commands and graphical interface to read the messages and to scroll quickly through the contacts list.
• More intensive use of the GUI to overcome the recognition problems and slowness of the system response.
• Mixed initiative dialog.
14
Interaction Example 1
U I want replace [recipient name] by carbon copy.
S Who would you like to send copy to?
U (barge-in) [recipient name] S Send copy to [recipient
name] U I want change the recipient
list.
15
Interaction Example 2
• U mailbox • S You have 4 e-
mails • U New search. Find
high priority messages from [recipient name]
• S You have 1 new priority e-mail
from [recipient name] • U Read it
16
Users apreciation
• The conversational and multimodal VPA concept was attractive to all users and was seen as a key enabler supporting the growing user mobile attitude.
• The VPA was seen as easy to use and intuitive. The Help part of the system was almost not used.
• Users did not liked excessive confirmations.• The use of the TTS Portuguese voice was
well accepted by the users. • Users liked voice-in and VUI and GUI-out in a
small-screen environment. • The multimodality was seen as a very good
capability to overcome recognition problems encountered in the VUI.
17
Future Use
But, when asked about the future use
• 58% of the users said that they would not use the system in its current form.
• Main reasons:– slow response time– recognition/understanding problems.
18
Failure?
Tell me “when it’s time” to stop!
19
NO!
Lessons learned– Speed of feedback is very important. Users
dislike latency latency or long periods of silence. – Improvements are needed to increase the
recognition accuracy of the spoken components.– Natural language is working ... with limitations.
Multimodal interfaces can overcome the weaknesses of each modality and exploit the full strengths of combined modes.