33
Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta Gabriella Cortellessa and Amedeo Cesta National Research Council of Italy Institute for Cognitive Science and Technology Rome, Italy Toward a Reliable Evaluation of Mixed-Initiative Systems

Toward a Reliable Evaluation of Mixed-Initiative Systems

  • Upload
    adia

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Toward a Reliable Evaluation of Mixed-Initiative Systems. Gabriella Cortellessa and Amedeo Cesta National Research Council of Italy Institute for Cognitive Science and Technology Rome, Italy. Outline. Motivations Aims of the study - PowerPoint PPT Presentation

Citation preview

Page 1: Toward a Reliable Evaluation  of Mixed-Initiative Systems

Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Gabriella Cortellessa and Amedeo Cesta

National Research Council of Italy Institute for Cognitive Science and Technology

Rome, Italy

Toward a Reliable Evaluation of Mixed-Initiative Systems

Page 2: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Outline

• Motivations• Aims of the study

– Users’ attitude towards the mixed-initiative paradigm

– Role of explanation during problem solving• Evaluation Method• Results• Conclusions and future work

Page 3: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Motivations

• Lack of studies that investigate users attitude towards this solving paradigm

• Lack of methodologies for evaluating different aspects of mixed-initiative problem solving

This work applies an experimental approach (from HCI and Psychology) to the problem of understanding users’ attitude towards the mixed-initiative approach and investigating the importance of explanation as a means to foster users’ involvement in the problem solving

Page 4: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

User

Two alternative Problem Solving approaches

Artificial problem solver

Interaction Module

Automated approach

Mixed-Initiative approach

User

Page 5: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Evaluating Mixed-Initiative Systems

1. Measuring the overall problem solving performance• The pair human-artificial system is supposed

to exhibit better performances (metrics).

2. Evaluating aspects related to users’ requirements and judgment on the system. • Usability, level of trust, clarity of presentation,

user satisfaction etc.considering users’ requirements and judgment

Page 6: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Aims of the study

1. Users’ attitude towards the solving strategy selection.• Automated vs mixed-initiative

2. The recourse to explanation during problem solving• Explanations for solvers’ choices and failures

Differences between experts and non experts

Page 7: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Solving strategy selection

No empirical studies in the mixed-initiative area explore the context of strategy selection (who and why choose a solving strategy)

However:Decision Support Systems

– Empirical evidence of low trust toward automated advices during decision making processes (Jones & Brown, 2002).

Human-Computer Interaction– Artificial solver as a competitor rather than a collaborator

(Langer, 1992; Nass & Moon, 2000).

Page 8: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Two variables are supposed to influence the selection of the solving strategy (automated vs. mixed-initiative): user’s expertise, and problem difficulty

Hypothesis 1:It is expected that expert users exploit the automated procedure more than non-experts; and, conversely, non-expert users exploit the mixed-initiative approach more than experts.

Hypothesis 1a:It is expected that inexperienced users prefer the mixed-initiative approach when solving easy problems, and the automated strategy when solving difficult problems, while expert users are expected to show the opposite behavior.

Solving strategy selection: Hypotheses

Page 9: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Explanation Recourse

No empirical studies in the mixed-initiative research field investigate the role of explanations in cooperative problem solving

However:Knowledge-Based Systems

– explanation recourse is more frequent in case of systems failures (Gilbert, 1989; Schank, 1986; Chandrasekaran & Mittal, 1999).

– explanation recourse is more frequent in case of collaborative problem solving (Gregor, 2001)

– individual differences in the motivations for explanations recourse (Mao & Benbasat, 1996; Ye, 1995).

Page 10: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Explanation Recourse: Hypotheses

The following variables are supposed to influence the recourse to explanation: user’s expertise, problem difficulty, strategy selection, failure.

Hypothesis 2:The access to explanation is more frequent in case of failure than in case of success.

Hypothesis 3 :Access to explanation is related to the solving strategy selection.

– In particular participants who choose the automated solving strategy access more frequently to explanation than those who use the mixed-initiative approach.

Page 11: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Explanation Recourse: Hypotheses

Hypothesis 4:During problem solving non experts access explanations more frequently than experts.

Hypothesis 5:Access to explanation is more frequent in case of difficult problems.

Page 12: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Evaluation Method

• Participants: – 96 participants balanced with respect to gender,

education, age and profession, subdivided in two groups based on the level of expertise (40 experts and 56 non experts).

• Experimental apparatus:– COMIREM problem solver– Planning and scheduling problems

• Procedure:– Web-based apparatus– Stimuli: Problems solution– Questionnaires

Page 13: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

A mixed-initiative problem solver: COMIREM

COMIREM: Continuous Mixed-Initiative Resource ManagementDeveloped at Carnegie Mellon University

User

Automated Solver Interaction Module

(Smith et al, 2003)

Page 14: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Procedure

– Training session– Two experimental sessions presented randomly :

• Session 1: easy problems • Questionnaire 1 • Session 2: difficult problems• Questionnaire 2

– For each session participants were asked to choose between mixed and automated strategy

DataBase

http://pst2.istc.cnr.it/experimentWeb-based

Page 15: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Tasks

Stimuli– 4 scheduling problems defined in the field of a

broadcast TV station resources management: • 2 solvable • 2 unsolvable

Questionnaires aiming to– Assessing the difficulty of the task: 5-steps Likert scale

(Manipulation check of variable difficulty)– Evaluating the clarity of textual and graphical

representations: (5-steps Likert scale)

– Investigating the reasons for choosing the selected strategy (multiple choice)

– Studying the reasons for accessing the explanation (only 2nd questionnaire)

Page 16: Toward a Reliable Evaluation  of Mixed-Initiative Systems

Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Solving Strategy Selection

Results

Page 17: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

56 .6786 .7653

40 1.3750 .7048

96 .9688 .8137

56 1.3214 .7653

40 .6250 .7048

96 1.0313 .8137

Non Expert

Expert

Total

Non Expert

Expert

Total

n_auto

n_mista

N MeanStd.

Deviation

F(1,94) = 20.62, p < .001

.6786

expertise

1.3750

1.3214

.6250

Influence of expertise on strategy

Influence of expertise on solving strategy selection (statistics)

Choice_auto

Choice_mixed

Dependent Variables

Page 18: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Influence of expertise on strategy

Hypothesis 1: Solving strategy selection (automated vs mixed-initiative) depends upon users’ expertise

VERIFIED: p < .001

Experts automatedNon experts mixed-initiative

Page 19: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Influence of difficulty on strategy

1030

3224

expertise

Non expert

Expert

strategyAutomated Mixed

4254Total

Chi-square = 9.80, df=1, p< .01

Easy Problems

1525

3224

expertise

Non expert

Expert

strategyAutomated Mixed

4749Total

Difficult ProblemsChi-square = 3.6 , df=1, n. s.

Page 20: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

• Hypothesis 1a: Solving strategy selection (automated vs mixed-initiative) is related to problem difficulty

PARTIALLY VERIFIED:

Easy problems experts: automated, non experts: mixed (p< .01)

Difficult problems (n. s.)

Influence of difficulty on strategy

Page 21: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

0

10

20

30

40

50

60

70

Time Trust in problemsolver

curiosity for theautomated solution

Espert

Non Espert

01020304050607080

Problem facility willingness tocontrol theproblemsolving

proccess

try bothstrategies

Experts

Non experts

Mixed -- Easy

Automated -- Easy

Chi-square = .92 , df=2, n. s.

Chi-square = 1.32 , df=2, n. s.

Automated -- Difficult

Chi-square = 3.9 , df=2, p< .05

0

10

20

30

40

50

60

Problemfacility

w illingnessto control the

problemsolvingprocess

try bothstrategies

Experts

Non experts

Chi-square = 1.15 , df=2, n. s.

Mixed -- Difficult

Reasons for strategy selection

0

10

20

30

40

50

60

70

Time Trust in theautomated solver

curiosity for theautomated

solution

Expert

Non expert

Page 22: Toward a Reliable Evaluation  of Mixed-Initiative Systems

Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Explanation Recourse

Results

Page 23: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

.8111 .3716 90

.3702 .3354 90

Access_failure

Access_correct

MeanStd.

Deviation N

F(1,89) = 85.37, p< .001

.8111

.3702

Dependent Variables

Influence of failures on explanation

r = .86 p < .001. r= .035, n.s.

Correlation Analysis

in case of failure in case of success

Page 24: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Influence of failures on explanation

Hypothesis 2:The access to explanation is more frequent in case of failure than in case of success.

VERIFIED p< .001

Page 25: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

I_AC_DIF Indice di accesso alla spiegazione in casodi compiti DIFFICILI

49 .6297 .2959

47 .2790 .2709

96 .4580 .3329

Automated

Mixed

Total

N MeanStd.

Deviation

I_AC_FAC Indice di accesso alla spiegazione in casodi compiti FACILI

54 .8769 .3373

42 .2802 .3202

96 .6158 .4430

Automated

Mixed

Total

N MeanStd.

Deviation

F(1,94) = 77.26, p< .001

.8769

.2802

F(1,94) = 36.60, p< .05

.6297

.2790

Easy problems Difficult problems

Accesseasy

Access difficult

Dependent Variables

Influence of strategy on explanation

Page 26: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Hypothesis 3:Access to explanation is related to the solving strategy selection.

Access to explanation is more frequent in case of automated strategy choice

VERIFIED

Easy problems p< .001

Influence of strategy on explanation

Difficult problems p< .05

Page 27: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

.5423 .4760 56

.7187 .3740 40

.6158 .4430 96

.3829 .3177 56

.5632 .3289 40

.4580 .3329 96

ExpertiseNon Expert

Expert

Total

Non Expert

Expert

Total

Access_easy

Access_difficult

MeanStd.

Deviation N

Dependent Variables

F(1,94) = .002, n. s.interaction

.6158

.4580

F(1,94) = 12.54, p< .01difficulty

F(1,94) = 7.34, p< .01expertise

.5423

.7187

.3829

.5632

Influence of expertise and difficulty on explanation

Page 28: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

• Hypotheses 4 e 5:– During problem solving non experts rely on

explanation more frequently than experts– Access to explanation is more frequent in case of

difficult problems.

p< .01expertise p< .01difficulty

Influence of expertise and difficulty on explanation

FALSIFIED

Page 29: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Reasons for accessing explanation

Chi-square = 2,28 , df=1, n. s.

0

10

20

30

4050

60

70

80

90

Per capire il problema Per capire le scelte delrisolutore automatico

Non Esperti

Esperti

Understand the problem Understand automated solvers choices

Non ExpertsExperts

Page 30: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Conclusions

• Solving strategy selection depends upon users’ expertise– Experts automated– Non experts mixed-initiative

• The mixed initiative approach is chosen to maintain the control over the problem solving

• Explanation during problem solving is frequently accessed (73 out of 96 respondents), the access being more frequent in case of: – Failures during problem solving – When using the automated strategy

• Explanation is accessed to understand solvers choices

Page 31: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Contributions

• Empirical proof that the mixed-initiative approach responds to a specific need of end users to keep the control over automated systems.

• The study confirms the need for developing problem solving systems in which humans play an active role

• Need for designing different interaction styles to support the existing individual differences (e.g., expert vs non experts)

• Empirical proof of the usefulness of explanation during problem solving. Failures have been identified as a main prompt to increase the frequency of access to explanation

Page 32: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Remarks

• Need for designing evaluation studies which takes into consideration the human component of the mixed-initiative system (importing methodologies from other fields)

• At present we have inherited the experience from disciplines like HCI and Psychology and adapted them to our specific case.

• The same approach can be followed to broaden the testing of different mixed-initiative features.

Page 33: Toward a Reliable Evaluation  of Mixed-Initiative Systems

AAAI Fall Symposium 05Towards a Reliable Evaluation of Mixed-Initiative Systems – G. Cortellessa and A. Cesta

Future work

• Investigating the impact of strategy (automated vs mixed-initiative) and explanation recourse on problem solving performance.

• Application of the evaluation methodology to measure different features of the mixed-initiative systems.

• Synthesis of “user-oriented” explanations