46
Title Text Evaluation: Controlled Experiments 1

Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Title Text

Evaluation: Controlled Experiments

1

Page 2: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Outline

• Evaluation beyond usability tests• Controlled Experiments• Other Evaluation Methods

2

Page 3: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Evaluation Beyond Usability Tests

3

Page 4: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Usability Evaluation (last week)

• Expert tests / walkthroughs• Usability Tests with users

• Main goal: formative– identify usability problems– improve the tool

4

Page 5: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Summative Evaluation (focus today)

• How good is it? Useful?• Better than other tools?

5

Page 6: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Formative and Summative:Usually combined

6Evaluation over time

formative summative

Page 7: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Evaluation goals (summative)

7

• Generalizability– Results can be applied to other people

• Precision– We measured what we wanted to measure

(controlling factors that were not intended to study)

• Realism– Study context is realistic

... usually trade-off between them!

Page 8: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

8

© McGrath / Carpendale

The selection of a research method depends on the research question and the object under study!

Page 9: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Controlled Experiments

9

Page 10: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Controlled experiment

• Or:– Laboratory Experiment – Lab study – User Study– A/B Testing (used in marketing)– …

10

Page 11: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Focus

11

• Precision• Generalizability (?)

• Overall goal– Reveal cause-effect relationships– e.g. smoking causes cancer

Page 12: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Scenario

12

A B

Which is better?

Page 13: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

13© Carpendale

Test it with users!

Page 14: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Hypothesis

• A precise problem statement• Example:

– H1 = Participants will buy more beer when using variant B than variant A

– Null-Hypothese H0 = no difference in beer purchase

14

A B

Page 15: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Independent Variables

• Factors to be studied• Typical independent variables (in HCI)

– Different types of design– Task type: e.g., searching/browsing– Participant demographics: e.g., male/female – Different technologies: touch pad vs. keyboard

• Control of Independent Variable– Levels: The number of variables in each factor– Limited by the length of the study and the number of

participants• How different?

– Entire interfaces vs. very specific parts15

A

B

Page 16: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Control Environment

• Make sure nothing else could cause your effect

• Control confounding variables• Randomization!

16

A

B

Page 17: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Different Designs: Between-Subjects

• Divide the participants into groups, each group does one condition

• Randomize: Group Assignment• Potential problem?

17

A

B

Group 1

Group 2

Page 18: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Different Designs: Within-Subjects

• Everybody does all the conditions• Can account for individual differences and reduce noise (that’s

why it may be more powerful and requires less participants)• Severely limits the number of conditions, and even types of

tasks tested (may be able to workaround by having multiple sessions)

• Can lead to ordering effects —> Randomize Order

18

A

B

Page 19: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Dependent Variable

• The things that you measure• Performance indicators:

– task completion time, error rates, mouse movement…– (numbers of beers bought)

• Subjective participant feedback: – satisfaction ratings, closed-ended questions,

interviews…– questionnaires (HCI lecture last week)

• Observations: – behaviors, signs of frustrations…

19

Page 20: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Tasks

• Specifying good tasks for controlled experiments is tricky– Specifically, if you are measuring performance criteria

• Task criteria– comparability for different interfaces– clear end point

• Example– usability test: >>buy a book for a 4 year old<<– controlled experiment: >>find and buy the book

Doctor Faustus by Thomas Mann<<20

Page 21: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Results: Application of Statistics

• Descriptive Statistics– Describes the data you gathered (e.g. visually)

• Inferential Statistics– Make predictions/inferences from your study to

the larger population

21

Page 22: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Descriptive statistics

• Central tendency– mean {1, 2, 4, 5} – median {15, 19, 22, 29, 33, 45, 50} – mode {12, 15, 22, 22, 22, 34, 34}

22

Page 23: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Descriptive statistics

• Central tendency– mean {1, 2, 4, 5} 3– median {15, 19, 22, 29, 33, 45, 50} 29– mode {12, 15, 22, 22, 22, 34, 34} 22

23

Page 24: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Descriptive statistics

• Central tendency– mean {1, 2, 4, 5} 3– median {15, 19, 22, 29, 33, 45, 50} 29– mode {12, 15, 22, 22, 22, 34, 34} 22

• Measures of spread– range– variance– standard deviation

24note: for inferential standard deviation N becomes (N-1) —> estimate for sampled population

=

=

Page 25: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Visualization of descriptive statistics

25

• Mean• 25/75% Quartiles• Min / Max• (alternative: with outliers)

e.g., Boxplot

Page 26: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Inferential statistics

• Goal: Generalize findings to the larger population

26http://www.latrobe.edu.au/psy/research/cognitive-and-developmental-psychology/esci

Page 27: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Excursus: Tragedy of the error bars

27

CI = Confidence intervals

SE = Standard Error (SD of the sampling distribution of the sample mean)

SD = Standard Deviation

Page 28: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Excursus: 95% Confidence intervals

• USE THEM!• Interpretation: We can be 95% confident that

the real mean lies within our confidence interval!

28

Page 29: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Null Hypothesis Testing

• Statistically significant results– p < .05– The probability that we incorrectly reject the

Null-Hypothesis (Type I error)• Many different tests

– t-test, ANOVA, …

29

A B

Page 30: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Validity

• Errors:– Type I: False positives– Type II: False negatives

• External Validity– Can we generalize the

study?– E.g. generalizable to the

larger population of undergrad students

• Internal Validity– Is there a causal relationship?– Are there alternate causes?

30

type I

type IIguilty

notguilty

Page 31: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Internal Validity: Storks deliver babies!?

31

• R. Matthews, “Storks Deliver Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001;

• There is a correlation coefficient of r=0.62 (reasonably high)

• A statistical test can be employed that shows that this correlation is in fact significant (p = 0.008)

• What are the flaws?

Page 32: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Pragmatically …A step-by-step how-to

32

Page 33: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Experimental Procedure:Typical example

• Identify research hypothesis• Specify the design of the study• Think about statistics *before* you run the

study• Run a pilot study• Recruit participants• Run the actual data collection sessions• Analyze the data• Report the results

33

Page 34: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Experimental Procedure:Typical example

• Identify research hypothesis• Specify the design of the study• Think about statistics *before* you run the

study• Run a pilot study • Recruit participants • Run the actual data collection sessions • Analyze the data• Report the results

34

Page 35: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Run a pilot study

• … to test the study design• … to test the system• … to test the study instruments

35

Page 36: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Recruit participants

• Reflecting the larger population?– in the best case yes– pragmatic decision though

• How many?– Depends on effect size and study design--power

of experiment– Usually 15+ (per group)– Note: much higher than for usability test (~5)

36

Page 37: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Run the actual data collection process• System and instruments ready?• Greet participants• Introduce purpose of study and procedure

– or deliberately don’t– Don’t bias: “compare my interface vs. this other interface”,

• Get consent of the participants– ethics!

• Assign participants to specific experiment condition– according to pre-defined randomization method

• Introduction to system(s) and/or training tasks• Participants complete the actual tasks

– take measures of dependent variables• Participants answer questionnaire (if any)• Debriefing session• Payment (if any).

– monetary, coupons, chocolate 37

Page 38: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Report the results

• Introduction / motivation• Study design• Results• Discussion• Conclusions • References / Appendix

• See, for instance, Saul Greenberg’s recommendation:– http://pages.cpsc.ucalgary.ca/~saul/hci_topics/

assignments/controlled_expt/ass1_reports.html38

Page 39: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Other Evaluation Methods

39

Page 40: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Field Studies

40

• Realism

• Reveal: “a richer understanding by using a more holistic approach” (Carpendale, 08)

Page 41: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Qualitative Methods

• Observation Techniques– fly-on-wall techniques– interruptions by observer

• Interview Techniques– contextual?

41

Page 42: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Qualitative Methods as “Add-on”

Often controlled experiment +• Experimenter Observations• Collecting Participants Opinions• Think-Aloud Protocol (be careful!)

Helpful for...• Usability Improvement (cf. HCI three weeks ago) • New insights, explanation of unforeseen results, new

questions• Can help to confirm results

42

Page 43: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Qualitative Methods as Primary

• Pre-design studies– Rich understanding of a complex domain– Problems, challenges, domain language

• During-, Post-design studies– Case studies/ Field studies

Helpful for...• holistic understanding

43

Page 44: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Qualitative Methods as Primary

• In Situ Observations• Participatory Observations• Laboratory Observational Studies• Contextual Interviews• Focus Groups

44

Page 45: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

Qualitative Challenges

• Sample Sizes– Doing intensive studies with a lot of participants?– Time? Data produced?

• Subjectivity– Social relationship?

• Analyzing the data– Grounded theory – Open and axial coding

45

Page 46: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/16s/LectureNotes/11_LabStudies.pdf · Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; • There

New Ways of Evaluation

• Mechanical Turk (more and more popular)• Measuring brain activities• ...

46