Upload
myra-clark
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Karrie Karahalios, Eric Gilbert6 April 2007
some slides courtesy of Brian Bailey and John Hart
cs414empirical user studies
• Conduct user study to gain more precise measure of the usability of an interface or system
• Complements low-fidelity techniques
• Requires a larger investment than low-fi prototyping
• Provide positive experience for users!
Messages
In Context of Task-Centered UI Design
• Measure performance, error rate, learnability and retention, satisfaction, tolerable network delay…
• adapt to your particular interface and context
• Compare results to usability goals
• Identify usability issues and resolve them
Empirical User Studies
• Develop materials
• Prepare for the study
• Conduct the study
• Analyze results and iterate
• Learn from the experience
Overview of Doing Empirical User Studies
• Identify usability goals
• Develop experimental tasks and design
• Recruit users
• Instrument software/hardware
Prepare for the Study
• Identify questions you want answered
• questions should be specific and measurable
• Examples:
• can a user perform each task in < 30s?
• after only five minutes of instruction, can a user perform each task with < 2 errors?
• are users rating the interface at least a ‘3’ for overall satisfaction on a 5-point scale?
Identify Usability Goals
• Structure of experiment
• what will users do, in what order, where, etc.
• Between groups (randomly assigned to treatment groups)
• Control group
• Experimental group
• Within groups
• Each user performs under all conditions
• Order randomized
• Cheaper because it uses fewer participants
Develop Experimental Design
• What gets changed and what is its effect?
• Independent variables
• the variables you manipulatee.g. # of menu items, lighting conditions, mouse vs. keys
• Dependent variables
• measured parte.g. speed of menu choice, reaction time to stimuli
• Variable type matters
• discretecontinuous
Experimental Variables
• Typically want about 8 – 12 users
• depends on desired confidence in the results
• 12 is the magic number for the ANOVA test (more later)
• This could be the most challenging aspect of the study
• expect about a 0.1% to 10% response rate
• may need IRB approval, especially if you want to publish
• Give users a compelling reason to participate
Recruit Users
Demographic Diversity
• It is important to target your user population.
• example: if you are developing for Firefox, make sure that you use people already familiar with Firefox.
• Beyond that, it is also important to gain a diversity of different types of users:
• age• sex• education• occupation• ...
• can tell you important things about your system, and help you generalize
• Log performance and errors (if possible)
• Determined media capture needs
• ensure that you have access to equipment
• manage physical layout of the testing space
• Anything else that you need?
Instrument Software/Hardware
• Give user an overview of the study
• Introduce your system, allow for practice
• Have users work through the tasks
• Collect experimental measures (e.g., performance and error data)
• Fill out questionnaire, if any
• Debrief the user
• Entire session should last less than 60 minutes
Conduct the Study
• Purpose of the study, but not necessarily details of what you are testing
• What they will be doing (the tasks)
• They are not being tested, the interface/system is
• They can quit at anytime and will not affect relationship with you, the university, the company, etc.
• About the equipment in the room
• Whether their face and/or actions will be recorded
• How to think aloud (if you are collecting verbal data)
• If you will or will not be available to answer questions
• Their data will be viewed only in aggregate form
• How long the session will take
Tell the User At Least:
• Offer breaks at boundary points
• Offer to send results in aggregate form or allows users to see improved interface
• Develop understandable instructions
• Do not “defend” your interface
• Do not make subjective comments about users, ease or difficulty of tasks, etc.
Make Users Feel Comfortable
• Analyze data using statistical methods (ANOVAs and Chi-Squared tests common)
• take a stats course, e.g., Stat 320, for more detail
• did you meet the goals? How from the goals are you?
Analyze Results and Iterate
t-tests and ANOVAs
• t-tests compare two random samples and determine if the samples are statistically significantly different
• e.g., are dynamic menus better than static menus?
• ANOVAs (analysis of variance) compare n random samples and determine if the samples are statistically significantly different
• e.g., which is best: dynamic, static or radial menus?
• Both assume the samples come from normal distributions and both produce p-values.
• .
• Bell curve
• y = exp(-x2)
• Occurs from sum of independent events
• e.g. sum of dice rolls
• Total time = t-find + t-home + t-click
• Total # of errors
Normal DistributionsNormal Distributions
1
σ 2π
p-values
• probability value
• The probability that the difference you observe in an experiment is due to random chance
• An expression of the confidence of your result
• Typically, a difference is called statistically significant whenp < 0.05.
Partial eta-squared
• Some ANOVAs produce partial eta-squared values in addition to p-values.
• They are becoming widespread in HCI literature.
• You may see them soon in a usability report.
• Partial eta-squared values offer a practical measure of significance.
• Measure performance (time, error rate)
• Measure user satisfaction
• Give realistic experience of the interface
• realistic system response
• move among tasks seamlessly
• designers not in control, the user is
• Focus will be on the details
• most big issues should already be resolved
Advantages of Empirical User Studies
• Users typically must come to the lab
• makes it more difficult to recruit them
• users may have anxiety
• Large setup effort involved
• software instrumentation, hardware setup, questionnaire design, IRB approval, etc.
• Prototype may crash
Disadvantages of Empirical User Studies
An Example of How This Gets Used in Practice
• “The Impact of Delayed Visual Feedback on Collaborative Performance” by Darren Gergle, presented at CHI 06.
• What is the relationship between delayed visual feedback and collaboration? How much network delay can be tolerated?
• e.g, architectural planning, telesurgery and remote repair
The Collaborative Puzzle Task
• The experimental task was for a helper to guide a worker through a visual puzzle over a network connection
Independent Variables
• Only one: visual delay in the helper’s view window
• Delay sampled from this distribution [60 - 3300ms]:
• f(n) = Tn = Tn-1 * e.05 with T1 = 60
Dependent Variables
• Only one: task performance time
• Participants were asked to perform the puzzle task as quickly and accurately as possible.
Quantitative Analysis Using ANOVA
• “For delays between 60ms and 939ms, we found no evidence to indicate any impact of delayed visual feedback on task performance (SE = (2.87), F1,610 = .028, p = .87).”
• p > 0.05, so the samples are not significantly different
• “However, for delay rates between 939ms and 1798ms there is a significant impact on task performance (F1,610 = 13.57, p < .001).”
• Since p < 0.001, this result is highly significant
Graph of Delay vs. Performance