View
216
Download
0
Category
Tags:
Preview:
Citation preview
www.id-book.com1
Usability Evaluation
©2011
The aims
Explain the key concepts used in evaluation.
Introduce different evaluation methods. Show how different methods are used for
different purposes at different stages of the design process and in different contexts of use.
Show how evaluators mix and modify methods.
Discuss the practical challenges .
www.id-book.com2
©2011
Why, what, where, and when to evaluate
Iterative design & evaluation is a continuous process that examines:
Why: to check that users can use the product and that they like it.
What: a conceptual model, early prototypes of a new system and later, more complete prototypes.
Where: in natural and laboratory settings. When: throughout design; finished products can be
evaluated to collect information to inform new products.
Designers need to check that they understand users’ requirements.
www.id-book.com3
©2011
Bruce Tognazzini tells you why you need to evaluate
“Iterative design, with its repeating cycle of design and testing, is the only validated methodology in existence that will consistently produce successful results. If you don’t have user-testing as an integral part of your design process you are going to throw buckets of money down the drain.”
See AskTog.com for topical discussions about design and evaluation.
www.id-book.com4
©2011
Types of evaluation
Controlled settings involving users, e.g., usability testing & experiments in laboratories and living labs.
Natural settings involving users, e.g., field studies to see how the product is used in the real world.
Any settings not involving users, e.g., consultants critique, predict, analyze & model aspects of the interface analytics.
www.id-book.com5
©2011
Usability lab
www.id-book.com6http://iat.ubalt.edu/usability_lab/
©2011
Living labs
People’s use of technology in their everyday lives can be evaluated in living labs.
Such evaluations are too difficult to do in a usability lab.
E.g., http://www.sfu.ca/westhouse.html
Currently occupied by participants!
www.id-book.com7
©2011
Usability testing & field studies can be complementary
www.id-book.com8
©2011
Evaluation case studies
Experiment to investigate a computer game Regan L. Mandryk and Kori M. Inkpen. 2004.
Physiological indicators for the evaluation of co-located collaborative play. In Proc. ACM conf. on Computer supported cooperative work (CSCW '04), 102-111. DOI=10.1145/1031607.1031625 http://doi.acm.org/10.1145/1031607.1031625
In the wild field study of skiers Crowd sourcing
www.id-book.com9
©2011
Challenge & engagement in a collaborative immersive game Physiological
measureswere used.
Engagement was measured in two conditions:1. Playing against
another person 2. Playing against a
computer.
www.id-book.com10
©2011
What does this data tell you?
Rate each experience state on a scale from 1 to 5 (1=low, 5 = high)
Playing against computer
Playing against friend
Mean St. Dev. Mean St. Dev.
Boring 2.3 0.949 1.7 0.949
Challenging 3.6 1.08 3.9 0.994
Easy 2.7 0.823 2.5 0.850
Engaging 3.8 0.422 4.3 0.675
Exciting 3.5 0.527 4.1 0.568
Frustrating 2.8 1.14 2.5 0.850
Fun 3.9 0.738 4.6 0.699
Source: Mandryk and Inkpen (2004).
www.id-book.com11
©2011
Question
What were the precautionary measures that the evaluators had to take?
www.id-book.com12
©2011
Evaluation case studies
Experiment to investigate a computer game
In the wild field study of skiers Francis Jambon and Brigitte Meillon. 2009. User
experience evaluation in the wild. In Ext. Abstracts of the int. conf. on Human factors in computing systems (CHI EA '09), 4069-4074. http://doi.acm.org/10.1145/1520340.1520619
Crowd sourcing
www.id-book.com13
©2011
Why study skiers in the wild ?
www.id-book.com14
Jambon et al. (2009) User experience in the wild. In: Proceedings of CHI ’09, ACM Press, New York,
p. 4072.
©2011
e-skiing system components
www.id-book.com15
Jambon et al. (2009) User experience in the wild. In: Proceedings of CHI ’09, ACM Press, New York,
p. 4072.
©2011
Evaluation case studies
Experiment to investigate a computer game
In the wild field study of skiers Crowd sourcing
Jeffrey Heer and Michael Bostock. 2010. Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In Proc. of the int. Conf. on Human factors in computing systems (CHI '10). ACM, New York, NY, USA, 203-212. http://doi.acm.org/10.1145/1753326.1753357
www.id-book.com16
©2011
Crowdsourcing-when might you use it?
www.id-book.com17
www.id-book.com18
Evaluating an ambient system The Hello Wall is a
new kind of system that is designed to explore how people react to its presence.
What are the challenges of evaluating systems like this?
©2011
Evaluation methods
www.id-book.com19
Method Controlled settings
Natural settings
Without users
Observing x x
Asking users
x x
Asking experts
x x
Testing x
Modeling x
The language of evaluation
Analytics Analytical evaluationControlled
experimentExpert review or crit Field study Formative evaluationHeuristic evaluation In the wild study
Living laboratoryPredictive evaluationSummative
evaluationUsability laboratory User studies Usability testing Users or participants
www.id-book.com20
©2011
Key points Evaluation & design are closely integrated in user-
centered design. Some of the same techniques are used in
evaluation as for establishing requirements but they are used differently (e.g. observation interviews & questionnaires).
Three types of evaluation: laboratory based with users, in the field with users, studies that do not involve users
The main methods are: observing, asking users, asking experts, user testing, inspection, and modeling users’ task performance, analytics.
Dealing with constraints is an important skill for evaluators to develop. www.id-book.com21
©2011
Butterfly Ballot
“The Butterfly Ballot: Anatomy of disaster” was written by Bruce Tognazzini, http://
www.asktog.com/columns/042ButterflyBallot.html
Try it as a design exercise – create a similar ballot, conduct a usability test (did people actually vote as they intended?), create a modified ballot, conduct a second test and see if the results are improved
www.id-book.com22
www.id-book.com23
www.id-book.com24
©2011
The aims are:
Introduce and explain the DECIDE framework.
Discuss the conceptual, practical, and ethical issues involved in evaluation.
www.id-book.com25
©2011
DECIDE: a framework to guide evaluation Determine the goals. Explore the questions. Choose the evaluation methods. Identify the practical issues. Decide how to deal with the
ethical issues. Evaluate, analyze, interpret and
present the data.
www.id-book.com26
©2011
Determine the goals What are the high-level goals of the
evaluation? Who wants it and why? The goals influence the methods used
for the study. Goals vary and could be to:
identify the best metaphor for the design check that user requirements are met check for consistency investigate how technology affects working
practices improve the usability of an existing product
www.id-book.com27
©2011
Explore the questions
Questions help to guide the evaluation. The goal of finding out why some
customers prefer to purchase paper airline tickets rather than e-tickets can be broken down into sub-questions: What are customers’ attitudes to e-tickets? Are they concerned about security? Is the interface for obtaining them poor?
What questions might you ask about the design of a cell phone?
www.id-book.com28
©2011
Choose the evaluation approach & methods The evaluation method influences how
data is collected, analyzed and presented.
E.g. field studies typically: Involve observation and interviews. Involve users in natural settings. Do not involve controlled tests Produce qualitative data.
www.id-book.com29
©2011
Identify practical issues
For example, how to:
Selecting usersFinding evaluatorsSelecting equipmentStaying on budgetStaying on schedule
www.id-book.com30
www.id-book.com31
Decide about ethical issues
Develop an informed consent form
Participants have a right to:- Know the goals of the study;- Know what will happen to the findings;- Privacy of personal information;- Leave when they wish; - Be treated politely.
©2011
Evaluate, interpret & present data Methods used influence how data is
evaluated, interpreted and presented. The following need to be considered:
- Reliability: can the study be replicated?- Validity: is it measuring what you expected?- Biases: is the process creating biases?- Scope: can the findings be generalized?- Ecological validity: is the environment influencing the findings? i.e. Hawthorn effect.
www.id-book.com32
©2011
Key points
Many issues to consider before conducting an evaluation study.
These include: goals of the study; involvment or not of users; the methods to use; practical & ethical issues; how data will be collected, analyzed & presented.
The DECIDE framework provides a useful checklist for planning an evaluation study.
www.id-book.com33
www.id-book.com34
An exercise left for you…
Find an evaluation study (proceedings of CHI, CSCW, SOUPS)
Use the DECIDE framework to analyze it. Describe the aspect of DECIDE that are
explicitly addressed in the report and which are not.
On a scale of 1-5, where 1 = poor and 5 = excellent, how would you rate this study?
www.id-book.com35
©2011
The aims:
• Explain how to do usability testing• Outline the basics of experimental
design • Describe how to do field studies
www.id-book.com36
©2011
Usability testing
Involves recording performance of typical users doing typical tasks.
Controlled settings. Users are observed and timed. Data is recorded on video & key presses are
logged. The data is used to calculate performance
times, and to identify & explain errors. User satisfaction is evaluated using
questionnaires & interviews. Field observations may be used to provide
contextual understanding. www.id-book.com37
©2011
Experiments & usability testing Experiments test hypotheses to
discover new knowledge by investigating the relationship between two or more things – i.e., variables.
Usability testing is applied experimentation.
Developers check that the system is usable by the intended user population for their tasks.
Experiments may also be done in usability testing.
www.id-book.com38
Usability testing vs research
Usability testing
Improve products Few participants Results inform design Usually not
completely replicable Conditions controlled
as much as possible Procedure planned Results reported to
developers
Experiments for research
Discover knowledge Many participants Results validated
statistically Must be replicable Strongly controlled
conditions Experimental design Scientific report to
scientific communitywww.id-book.com39
©2011
Usability testing
Goals & questions focus on how well users perform tasks with the product.
Comparison of products or prototypes common.
Focus is on time to complete task & number & type of errors.
Data collected by video & interaction logging.
Testing is central. User satisfaction questionnaires &
interviews provide data about users’ opinions.
www.id-book.com40
©2011
Usability lab with observers watching a user & assistant
www.id-book.com41
Portable equipment for use in the field
www.id-book.com42
www.id-book.com43
©2011
Mobile head-mounted eye tracker
www.id-book.com44
Picture courtesy of SensoMotoric Instruments (SMI), copyright 2010
©2011
Testing conditions
Usability lab or other controlled space. Emphasis on:
selecting representative users; developing representative tasks.
5-10 users typically selected. Tasks usually last no more than 30
minutes. The test conditions should be the same
for every participant. Informed consent form explains
procedures and deals with ethical issues.www.id-book.com45
©2011
Typical metrics
Time to complete a task. Time to complete a task after a specified.
time away from the product. Number and type of errors per task. Number of errors per unit of time. Number of navigations to online help or
manuals. Number of users making a particular
error. Number of users completing task
successfully.www.id-book.com46
©2011
Other metrics?
Not everything is performance driven…
www.id-book.com47
©2011
Usability engineering orientation
Aim is improvement with each version.
Current level of performance. Minimum acceptable level of
performance. Target level of performance.
www.id-book.com48
©2011
How many participants is enough for user testing? The number is a practical issue. Depends on:
schedule for testing; availability of participants; cost of running tests.
Typically 5-10 participants. Some experts argue that testing
should continue until no new insights are gained.
www.id-book.com49
www.id-book.com50
Name 3 features for each that can be tested by usability testing
iPad
www.id-book.com51
©2011
Neilsen Norman Group study
http://www.nngroup.com/reports/mobile/ipad/ “These reports are based on 2 rounds of
usability studies with real users, reporting how they actually used a broad variety of iPad apps as well as websites accessed on the iPad.”
“The first edition of our iPad report (from 2010) describes the original research we conducted with the first crop of iPad apps immediately after the launch of this tablet device.”
“The second edition of our iPad report (from 2011) describes the follow-up research we conducted a year later, after apps and websites had been given the chance to improve based on the early findings.”www.id-book.com52
©2011
Research Goals
Understand how the interactions with the device affected people
Get feedback to their clients and developers
Report on whether the iPad lived up to the hype
Are user expectations different for the iPad compared with the iPhone?
www.id-book.com53
©2011
Methodology: Usability Testing
With a think aloud protocol Conducted in Chicago, and Fremont, CA Aim of understanding the typical
usability issues that people encounter when using apps and accessing websites on the iPad
7 participants : Experienced iPhone users (3+ months exp)
who used a variety of apps Varied ages (20’s to 60’s) and occupations 3 male, 4 female
www.id-book.com54
©2011
Questions:
Do you think the selection of participants was appropriate? Why?
What problems may occur when asking participants to think out loud as they complete tasks?
www.id-book.com55
©2011
The tests
90 minutes of exploring applications on the iPad (self-directed exploration) Researcher sat beside participant,
observed, took notes Session was video recorded
Specific tasks Open a specific app/website Carry out one or more tasks “as if they were
on their own” Balanced presentation order of tasks 60 tasks from 32 different sites
www.id-book.com56
©2011
The Equipment
Camera (pointing down at the iPad) recorded the participant’s interactions and gestures when using the iPad Recording streamed to a laptop computer
Webcam used to record the expressions on the participants’ faces and their think aloud commentary
Laptop ran Morae, which synched the two data streams
Up to 3 observers watched the video streams (including the in-room moderator) so that the person’s personal space was not invaded
www.id-book.com57
©2011
App or website
Task
iBook Download a free copy of Alice’s Adventures in Wonderland and read through the first few pages.
Craigslist Find some free mulch for your garden.
eBay You want to buy a new iPad on eBay. Find one that you could buy from a reputable seller.
Time Magazine
Browse through the magazine and find the best pictures of the week.
Epicurious You want to make an apple pie for tonight. Find a recipe and see what you need to buy in order to prepare it.
Kayak You are planning a trip to Death Valley in May this year. Find a hotel located in or close to the park.
www.id-book.com58
How representative do you think these tasks are?
©2011
Question?
What aspects are considered to be important for good usability and user experience in this case?
www.id-book.com59
Metrics of interest
Usability User Experience Efficient Effective Safe Easy to learn Easy to remember Have good utility
Support creativity Be motivating Be helpful Be satisfying to
use
www.id-book.com60
©2011
Results
Website interactions not optimal Links too small to tap reliably Fonts could be difficult to read
Participants sometimes did not know where to tap on the iPad to select options such as buttons and menus
Got lost (no back button to return to the home page)
Inconsistencies between portrait and lansdcape presentations
www.id-book.com61
©2011
Usability Experiments
Predict the relationship between two or more variables.
Independent variable is manipulated by the researcher.
Dependent variable depends on the independent variable.
Typical experimental designs have one or two independent variable.
Validated statistically & replicable.www.id-book.com62
©2011
Experimental designs
Between subjects: Different participants - single group of participants is allocated randomly to the experimental conditions.
Within subjects: Same participants - all participants appear in both conditions.
Matched participants - participants are matched in pairs, e.g., based on expertise, gender, etc.
www.id-book.com63
©2011
Between, within, matched participant design
Design Advantages Disadvantages
Between No order effects Many subjects & individual differences a problem
Within Few individuals, no individual differences
Counter-balancing needed because of ordering effects
Matched Same as different participants but individual differences reduced
Cannot be sure of perfect matching on all differences
www.id-book.com64
©2011
Field studies
Field studies are done in natural settings. “in the wild” is a term for prototypes being
used freely in natural settings. Aim to understand what users do naturally
and how technology impacts them. Field studies are used in product design to:
- identify opportunities for new technology;- determine design requirements; - decide how best to introduce new technology;- evaluate technology in use.
www.id-book.com65
©2011
Data collection & analysis Observation & interviews
Notes, pictures, recordings Video Logging
Analyzes Categorized Categories can be provided by theory
Grounded theory Activity theory
www.id-book.com66
©2011
Data presentation
The aim is to show how the products are being appropriated and integrated into their surroundings.
Typical presentation forms include: vignettes, excerpts, critical incidents, patterns, and narratives.
www.id-book.com67
©2011
Key points 1 Usability testing is done in controlled
conditions. Usability testing is an adapted form of
experimentation. Experiments aim to test hypotheses by
manipulating certain variables while keeping others constant.
The experimenter controls the independent variable(s) but not the dependent variable(s).
www.id-book.com68
©2011
Key points 2 There are three types of experimental
design: different-participants, same- participants, & matched participants.
Field studies are done in natural environments.
“In the wild” is a recent term for studies in which a prototype is freely used in a natural setting.
Typically observation and interviews are used to collect field studies data.
Data is usually presented as anecdotes, excerpts, critical incidents, patterns and narratives. www.id-book.com69
Recommended