Upload
adia
View
32
Download
0
Embed Size (px)
DESCRIPTION
Toward a Reliable Evaluation of Mixed-Initiative Systems. Gabriella Cortellessa and Amedeo Cesta National Research Council of Italy Institute for Cognitive Science and Technology Rome, Italy. Outline. Motivations Aims of the study - PowerPoint PPT Presentation
Citation preview
Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Gabriella Cortellessa and Amedeo Cesta
National Research Council of Italy Institute for Cognitive Science and Technology
Rome, Italy
Toward a Reliable Evaluation of Mixed-Initiative Systems
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Outline
• Motivations• Aims of the study
– Users’ attitude towards the mixed-initiative paradigm
– Role of explanation during problem solving• Evaluation Method• Results• Conclusions and future work
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Motivations
• Lack of studies that investigate users attitude towards this solving paradigm
• Lack of methodologies for evaluating different aspects of mixed-initiative problem solving
This work applies an experimental approach (from HCI and Psychology) to the problem of understanding users’ attitude towards the mixed-initiative approach and investigating the importance of explanation as a means to foster users’ involvement in the problem solving
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
User
Two alternative Problem Solving approaches
Artificial problem solver
Interaction Module
Automated approach
Mixed-Initiative approach
User
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Evaluating Mixed-Initiative Systems
1. Measuring the overall problem solving performance• The pair human-artificial system is supposed
to exhibit better performances (metrics).
2. Evaluating aspects related to users’ requirements and judgment on the system. • Usability, level of trust, clarity of presentation,
user satisfaction etc.considering users’ requirements and judgment
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Aims of the study
1. Users’ attitude towards the solving strategy selection.• Automated vs mixed-initiative
2. The recourse to explanation during problem solving• Explanations for solvers’ choices and failures
Differences between experts and non experts
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Solving strategy selection
No empirical studies in the mixed-initiative area explore the context of strategy selection (who and why choose a solving strategy)
However:Decision Support Systems
– Empirical evidence of low trust toward automated advices during decision making processes (Jones & Brown, 2002).
Human-Computer Interaction– Artificial solver as a competitor rather than a collaborator
(Langer, 1992; Nass & Moon, 2000).
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Two variables are supposed to influence the selection of the solving strategy (automated vs. mixed-initiative): user’s expertise, and problem difficulty
Hypothesis 1:It is expected that expert users exploit the automated procedure more than non-experts; and, conversely, non-expert users exploit the mixed-initiative approach more than experts.
Hypothesis 1a:It is expected that inexperienced users prefer the mixed-initiative approach when solving easy problems, and the automated strategy when solving difficult problems, while expert users are expected to show the opposite behavior.
Solving strategy selection: Hypotheses
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Explanation Recourse
No empirical studies in the mixed-initiative research field investigate the role of explanations in cooperative problem solving
However:Knowledge-Based Systems
– explanation recourse is more frequent in case of systems failures (Gilbert, 1989; Schank, 1986; Chandrasekaran & Mittal, 1999).
– explanation recourse is more frequent in case of collaborative problem solving (Gregor, 2001)
– individual differences in the motivations for explanations recourse (Mao & Benbasat, 1996; Ye, 1995).
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Explanation Recourse: Hypotheses
The following variables are supposed to influence the recourse to explanation: user’s expertise, problem difficulty, strategy selection, failure.
Hypothesis 2:The access to explanation is more frequent in case of failure than in case of success.
Hypothesis 3 :Access to explanation is related to the solving strategy selection.
– In particular participants who choose the automated solving strategy access more frequently to explanation than those who use the mixed-initiative approach.
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Explanation Recourse: Hypotheses
Hypothesis 4:During problem solving non experts access explanations more frequently than experts.
Hypothesis 5:Access to explanation is more frequent in case of difficult problems.
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Evaluation Method
• Participants: – 96 participants balanced with respect to gender,
education, age and profession, subdivided in two groups based on the level of expertise (40 experts and 56 non experts).
• Experimental apparatus:– COMIREM problem solver– Planning and scheduling problems
• Procedure:– Web-based apparatus– Stimuli: Problems solution– Questionnaires
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
A mixed-initiative problem solver: COMIREM
COMIREM: Continuous Mixed-Initiative Resource ManagementDeveloped at Carnegie Mellon University
User
Automated Solver Interaction Module
(Smith et al, 2003)
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Procedure
– Training session– Two experimental sessions presented randomly :
• Session 1: easy problems • Questionnaire 1 • Session 2: difficult problems• Questionnaire 2
– For each session participants were asked to choose between mixed and automated strategy
DataBase
http://pst2.istc.cnr.it/experimentWeb-based
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Tasks
Stimuli– 4 scheduling problems defined in the field of a
broadcast TV station resources management: • 2 solvable • 2 unsolvable
Questionnaires aiming to– Assessing the difficulty of the task: 5-steps Likert scale
(Manipulation check of variable difficulty)– Evaluating the clarity of textual and graphical
representations: (5-steps Likert scale)
– Investigating the reasons for choosing the selected strategy (multiple choice)
– Studying the reasons for accessing the explanation (only 2nd questionnaire)
Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Solving Strategy Selection
Results
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
56 .6786 .7653
40 1.3750 .7048
96 .9688 .8137
56 1.3214 .7653
40 .6250 .7048
96 1.0313 .8137
Non Expert
Expert
Total
Non Expert
Expert
Total
n_auto
n_mista
N MeanStd.
Deviation
F(1,94) = 20.62, p < .001
.6786
expertise
1.3750
1.3214
.6250
Influence of expertise on strategy
Influence of expertise on solving strategy selection (statistics)
Choice_auto
Choice_mixed
Dependent Variables
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Influence of expertise on strategy
Hypothesis 1: Solving strategy selection (automated vs mixed-initiative) depends upon users’ expertise
VERIFIED: p < .001
Experts automatedNon experts mixed-initiative
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Influence of difficulty on strategy
1030
3224
expertise
Non expert
Expert
strategyAutomated Mixed
4254Total
Chi-square = 9.80, df=1, p< .01
Easy Problems
1525
3224
expertise
Non expert
Expert
strategyAutomated Mixed
4749Total
Difficult ProblemsChi-square = 3.6 , df=1, n. s.
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
• Hypothesis 1a: Solving strategy selection (automated vs mixed-initiative) is related to problem difficulty
PARTIALLY VERIFIED:
Easy problems experts: automated, non experts: mixed (p< .01)
Difficult problems (n. s.)
Influence of difficulty on strategy
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
0
10
20
30
40
50
60
70
Time Trust in problemsolver
curiosity for theautomated solution
Espert
Non Espert
01020304050607080
Problem facility willingness tocontrol theproblemsolving
proccess
try bothstrategies
Experts
Non experts
Mixed -- Easy
Automated -- Easy
Chi-square = .92 , df=2, n. s.
Chi-square = 1.32 , df=2, n. s.
Automated -- Difficult
Chi-square = 3.9 , df=2, p< .05
0
10
20
30
40
50
60
Problemfacility
w illingnessto control the
problemsolvingprocess
try bothstrategies
Experts
Non experts
Chi-square = 1.15 , df=2, n. s.
Mixed -- Difficult
Reasons for strategy selection
0
10
20
30
40
50
60
70
Time Trust in theautomated solver
curiosity for theautomated
solution
Expert
Non expert
Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Explanation Recourse
Results
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
.8111 .3716 90
.3702 .3354 90
Access_failure
Access_correct
MeanStd.
Deviation N
F(1,89) = 85.37, p< .001
.8111
.3702
Dependent Variables
Influence of failures on explanation
r = .86 p < .001. r= .035, n.s.
Correlation Analysis
in case of failure in case of success
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Influence of failures on explanation
Hypothesis 2:The access to explanation is more frequent in case of failure than in case of success.
VERIFIED p< .001
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
I_AC_DIF Indice di accesso alla spiegazione in casodi compiti DIFFICILI
49 .6297 .2959
47 .2790 .2709
96 .4580 .3329
Automated
Mixed
Total
N MeanStd.
Deviation
I_AC_FAC Indice di accesso alla spiegazione in casodi compiti FACILI
54 .8769 .3373
42 .2802 .3202
96 .6158 .4430
Automated
Mixed
Total
N MeanStd.
Deviation
F(1,94) = 77.26, p< .001
.8769
.2802
F(1,94) = 36.60, p< .05
.6297
.2790
Easy problems Difficult problems
Accesseasy
Access difficult
Dependent Variables
Influence of strategy on explanation
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Hypothesis 3:Access to explanation is related to the solving strategy selection.
Access to explanation is more frequent in case of automated strategy choice
VERIFIED
Easy problems p< .001
Influence of strategy on explanation
Difficult problems p< .05
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
.5423 .4760 56
.7187 .3740 40
.6158 .4430 96
.3829 .3177 56
.5632 .3289 40
.4580 .3329 96
ExpertiseNon Expert
Expert
Total
Non Expert
Expert
Total
Access_easy
Access_difficult
MeanStd.
Deviation N
Dependent Variables
F(1,94) = .002, n. s.interaction
.6158
.4580
F(1,94) = 12.54, p< .01difficulty
F(1,94) = 7.34, p< .01expertise
.5423
.7187
.3829
.5632
Influence of expertise and difficulty on explanation
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
• Hypotheses 4 e 5:– During problem solving non experts rely on
explanation more frequently than experts– Access to explanation is more frequent in case of
difficult problems.
p< .01expertise p< .01difficulty
Influence of expertise and difficulty on explanation
FALSIFIED
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Reasons for accessing explanation
Chi-square = 2,28 , df=1, n. s.
0
10
20
30
4050
60
70
80
90
Per capire il problema Per capire le scelte delrisolutore automatico
Non Esperti
Esperti
Understand the problem Understand automated solvers choices
Non ExpertsExperts
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Conclusions
• Solving strategy selection depends upon users’ expertise– Experts automated– Non experts mixed-initiative
• The mixed initiative approach is chosen to maintain the control over the problem solving
• Explanation during problem solving is frequently accessed (73 out of 96 respondents), the access being more frequent in case of: – Failures during problem solving – When using the automated strategy
• Explanation is accessed to understand solvers choices
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Contributions
• Empirical proof that the mixed-initiative approach responds to a specific need of end users to keep the control over automated systems.
• The study confirms the need for developing problem solving systems in which humans play an active role
• Need for designing different interaction styles to support the existing individual differences (e.g., expert vs non experts)
• Empirical proof of the usefulness of explanation during problem solving. Failures have been identified as a main prompt to increase the frequency of access to explanation
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Remarks
• Need for designing evaluation studies which takes into consideration the human component of the mixed-initiative system (importing methodologies from other fields)
• At present we have inherited the experience from disciplines like HCI and Psychology and adapted them to our specific case.
• The same approach can be followed to broaden the testing of different mixed-initiative features.
AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta
Future work
• Investigating the impact of strategy (automated vs mixed-initiative) and explanation recourse on problem solving performance.
• Application of the evaluation methodology to measure different features of the mixed-initiative systems.
• Synthesis of “user-oriented” explanations