Karrie Karahalios, Eric Gilbert 6 April 2007 some slides courtesy of Brian Bailey and John Hart cs414 empirical user studies

Karrie Karahalios, Eric Gilbert6 April 2007

some slides courtesy of Brian Bailey and John Hart

cs414empirical user studies

• Conduct user study to gain more precise measure of the usability of an interface or system

• Complements low-fidelity techniques

• Requires a larger investment than low-fi prototyping

• Provide positive experience for users!

Messages

In Context of Task-Centered UI Design

• Measure performance, error rate, learnability and retention, satisfaction, tolerable network delay…

• adapt to your particular interface and context

• Compare results to usability goals

• Identify usability issues and resolve them

Empirical User Studies

• Develop materials

• Prepare for the study

• Conduct the study

• Analyze results and iterate

• Learn from the experience

Overview of Doing Empirical User Studies

• Identify usability goals

• Develop experimental tasks and design

• Recruit users

• Instrument software/hardware

Prepare for the Study

• Identify questions you want answered

• questions should be specific and measurable

• Examples:

• can a user perform each task in < 30s?

• after only five minutes of instruction, can a user perform each task with < 2 errors?

• are users rating the interface at least a ‘3’ for overall satisfaction on a 5-point scale?

Identify Usability Goals

• Structure of experiment

• what will users do, in what order, where, etc.

• Between groups (randomly assigned to treatment groups)

• Control group

• Experimental group

• Within groups

• Each user performs under all conditions

• Order randomized

• Cheaper because it uses fewer participants

Develop Experimental Design

• What gets changed and what is its effect?

• Independent variables

• the variables you manipulatee.g. # of menu items, lighting conditions, mouse vs. keys

• Dependent variables

• measured parte.g. speed of menu choice, reaction time to stimuli

• Variable type matters

• discretecontinuous

Experimental Variables

• Typically want about 8 – 12 users

• depends on desired confidence in the results

• 12 is the magic number for the ANOVA test (more later)

• This could be the most challenging aspect of the study

• expect about a 0.1% to 10% response rate

• may need IRB approval, especially if you want to publish

• Give users a compelling reason to participate

Recruit Users

Demographic Diversity

• It is important to target your user population.

• example: if you are developing for Firefox, make sure that you use people already familiar with Firefox.

• Beyond that, it is also important to gain a diversity of different types of users:

• age• sex• education• occupation• ...

• can tell you important things about your system, and help you generalize

• Log performance and errors (if possible)

• Determined media capture needs

• ensure that you have access to equipment

• manage physical layout of the testing space

• Anything else that you need?

Instrument Software/Hardware

• Give user an overview of the study

• Introduce your system, allow for practice

• Have users work through the tasks

• Collect experimental measures (e.g., performance and error data)

• Fill out questionnaire, if any

• Debrief the user

• Entire session should last less than 60 minutes

Conduct the Study

• Purpose of the study, but not necessarily details of what you are testing

• What they will be doing (the tasks)

• They are not being tested, the interface/system is

• They can quit at anytime and will not affect relationship with you, the university, the company, etc.

• About the equipment in the room

• Whether their face and/or actions will be recorded

• How to think aloud (if you are collecting verbal data)

• If you will or will not be available to answer questions

• Their data will be viewed only in aggregate form

• How long the session will take

Tell the User At Least:

• Offer breaks at boundary points

• Offer to send results in aggregate form or allows users to see improved interface

• Develop understandable instructions

• Do not “defend” your interface

• Do not make subjective comments about users, ease or difficulty of tasks, etc.

Make Users Feel Comfortable

• Analyze data using statistical methods (ANOVAs and Chi-Squared tests common)

• take a stats course, e.g., Stat 320, for more detail

• did you meet the goals? How from the goals are you?

Analyze Results and Iterate

t-tests and ANOVAs

• t-tests compare two random samples and determine if the samples are statistically significantly different

• e.g., are dynamic menus better than static menus?

• ANOVAs (analysis of variance) compare n random samples and determine if the samples are statistically significantly different

• e.g., which is best: dynamic, static or radial menus?

• Both assume the samples come from normal distributions and both produce p-values.

• .

• Bell curve

• y = exp(-x2)

• Occurs from sum of independent events

• e.g. sum of dice rolls

• Total time = t-find + t-home + t-click

• Total # of errors

Normal DistributionsNormal Distributions

1

σ 2π

p-values

• probability value

• The probability that the difference you observe in an experiment is due to random chance

• An expression of the confidence of your result

• Typically, a difference is called statistically significant whenp < 0.05.

Partial eta-squared

• Some ANOVAs produce partial eta-squared values in addition to p-values.

• They are becoming widespread in HCI literature.

• You may see them soon in a usability report.

• Partial eta-squared values offer a practical measure of significance.

• Measure performance (time, error rate)

• Measure user satisfaction

• Give realistic experience of the interface

• realistic system response

• move among tasks seamlessly

• designers not in control, the user is

• Focus will be on the details

• most big issues should already be resolved

Advantages of Empirical User Studies

• Users typically must come to the lab

• makes it more difficult to recruit them

• users may have anxiety

• Large setup effort involved

• software instrumentation, hardware setup, questionnaire design, IRB approval, etc.

• Prototype may crash

Disadvantages of Empirical User Studies

An Example of How This Gets Used in Practice

• “The Impact of Delayed Visual Feedback on Collaborative Performance” by Darren Gergle, presented at CHI 06.

• What is the relationship between delayed visual feedback and collaboration? How much network delay can be tolerated?

• e.g, architectural planning, telesurgery and remote repair

The Collaborative Puzzle Task

• The experimental task was for a helper to guide a worker through a visual puzzle over a network connection

Independent Variables

• Only one: visual delay in the helper’s view window

• Delay sampled from this distribution [60 - 3300ms]:

• f(n) = Tn = Tn-1 * e.05 with T1 = 60

Dependent Variables

• Only one: task performance time

• Participants were asked to perform the puzzle task as quickly and accurately as possible.

Quantitative Analysis Using ANOVA

• “For delays between 60ms and 939ms, we found no evidence to indicate any impact of delayed visual feedback on task performance (SE = (2.87), F1,610 = .028, p = .87).”

• p > 0.05, so the samples are not significantly different

• “However, for delay rates between 939ms and 1798ms there is a significant impact on task performance (F1,610 = 13.57, p < .001).”

• Since p < 0.001, this result is highly significant

Graph of Delay vs. Performance

Documents

Karrie Karahalios, Eric Gilbert 6 April 2007 some slides courtesy of Brian Bailey and John Hart cs414 empirical user studies