Www.id-book.com1 Usability Evaluation. ©2011 The aims Explain the key concepts used in evaluation....

www.id-book.com1

Usability Evaluation

The aims

Explain the key concepts used in evaluation.

Introduce different evaluation methods. Show how different methods are used for

different purposes at different stages of the design process and in different contexts of use.

Show how evaluators mix and modify methods.

Discuss the practical challenges .

www.id-book.com2

Why, what, where, and when to evaluate

Iterative design & evaluation is a continuous process that examines:

Why: to check that users can use the product and that they like it.

What: a conceptual model, early prototypes of a new system and later, more complete prototypes.

Where: in natural and laboratory settings. When: throughout design; finished products can be

evaluated to collect information to inform new products.

Designers need to check that they understand users’ requirements.

www.id-book.com3

Bruce Tognazzini tells you why you need to evaluate

“Iterative design, with its repeating cycle of design and testing, is the only validated methodology in existence that will consistently produce successful results. If you don’t have user-testing as an integral part of your design process you are going to throw buckets of money down the drain.”

See AskTog.com for topical discussions about design and evaluation.

www.id-book.com4

Types of evaluation

Controlled settings involving users, e.g., usability testing & experiments in laboratories and living labs.

Natural settings involving users, e.g., field studies to see how the product is used in the real world.

Any settings not involving users, e.g., consultants critique, predict, analyze & model aspects of the interface analytics.

www.id-book.com5

Usability lab

www.id-book.com6http://iat.ubalt.edu/usability_lab/

Living labs

People’s use of technology in their everyday lives can be evaluated in living labs.

Such evaluations are too difficult to do in a usability lab.

E.g., http://www.sfu.ca/westhouse.html

Currently occupied by participants!

www.id-book.com7

Usability testing & field studies can be complementary

www.id-book.com8

Evaluation case studies

Experiment to investigate a computer game Regan L. Mandryk and Kori M. Inkpen. 2004.

Physiological indicators for the evaluation of co-located collaborative play. In Proc. ACM conf. on Computer supported cooperative work (CSCW '04), 102-111. DOI=10.1145/1031607.1031625 http://doi.acm.org/10.1145/1031607.1031625

In the wild field study of skiers Crowd sourcing

www.id-book.com9

Challenge & engagement in a collaborative immersive game Physiological

measureswere used.

Engagement was measured in two conditions:1. Playing against

another person 2. Playing against a

computer.

www.id-book.com10

What does this data tell you?

Rate each experience state on a scale from 1 to 5 (1=low, 5 = high)

Playing against computer

Playing against friend

Mean St. Dev. Mean St. Dev.

Boring 2.3 0.949 1.7 0.949

Challenging 3.6 1.08 3.9 0.994

Easy 2.7 0.823 2.5 0.850

Engaging 3.8 0.422 4.3 0.675

Exciting 3.5 0.527 4.1 0.568

Frustrating 2.8 1.14 2.5 0.850

Fun 3.9 0.738 4.6 0.699

Source: Mandryk and Inkpen (2004).

www.id-book.com11

Question

What were the precautionary measures that the evaluators had to take?

www.id-book.com12

Experiment to investigate a computer game

In the wild field study of skiers Francis Jambon and Brigitte Meillon. 2009. User

experience evaluation in the wild. In Ext. Abstracts of the int. conf. on Human factors in computing systems (CHI EA '09), 4069-4074. http://doi.acm.org/10.1145/1520340.1520619

Crowd sourcing

www.id-book.com13

Why study skiers in the wild ?

www.id-book.com14

Jambon et al. (2009) User experience in the wild. In: Proceedings of CHI ’09, ACM Press, New York,

p. 4072.

e-skiing system components

www.id-book.com15

Jambon et al. (2009) User experience in the wild. In: Proceedings of CHI ’09, ACM Press, New York,

p. 4072.

Experiment to investigate a computer game

In the wild field study of skiers Crowd sourcing

Jeffrey Heer and Michael Bostock. 2010. Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In Proc. of the int. Conf. on Human factors in computing systems (CHI '10). ACM, New York, NY, USA, 203-212. http://doi.acm.org/10.1145/1753326.1753357

www.id-book.com16

Crowdsourcing-when might you use it?

www.id-book.com17

www.id-book.com18

Evaluating an ambient system The Hello Wall is a

new kind of system that is designed to explore how people react to its presence.

What are the challenges of evaluating systems like this?

Evaluation methods

www.id-book.com19

Method Controlled settings

Natural settings

Without users

Observing x x

Asking users

Asking experts

Testing x

Modeling x

The language of evaluation

Analytics Analytical evaluationControlled

experimentExpert review or crit Field study Formative evaluationHeuristic evaluation In the wild study

Living laboratoryPredictive evaluationSummative

evaluationUsability laboratory User studies Usability testing Users or participants

www.id-book.com20

Key points Evaluation & design are closely integrated in user-

centered design. Some of the same techniques are used in

evaluation as for establishing requirements but they are used differently (e.g. observation interviews & questionnaires).

Three types of evaluation: laboratory based with users, in the field with users, studies that do not involve users

The main methods are: observing, asking users, asking experts, user testing, inspection, and modeling users’ task performance, analytics.

Dealing with constraints is an important skill for evaluators to develop. www.id-book.com21

Butterfly Ballot

“The Butterfly Ballot: Anatomy of disaster” was written by Bruce Tognazzini, http://

www.asktog.com/columns/042ButterflyBallot.html

Try it as a design exercise – create a similar ballot, conduct a usability test (did people actually vote as they intended?), create a modified ballot, conduct a second test and see if the results are improved

www.id-book.com22

www.id-book.com23

www.id-book.com24

The aims are:

Introduce and explain the DECIDE framework.

Discuss the conceptual, practical, and ethical issues involved in evaluation.

www.id-book.com25

DECIDE: a framework to guide evaluation Determine the goals. Explore the questions. Choose the evaluation methods. Identify the practical issues. Decide how to deal with the

ethical issues. Evaluate, analyze, interpret and

present the data.

www.id-book.com26

Determine the goals What are the high-level goals of the

evaluation? Who wants it and why? The goals influence the methods used

for the study. Goals vary and could be to:

identify the best metaphor for the design check that user requirements are met check for consistency investigate how technology affects working

practices improve the usability of an existing product

www.id-book.com27

Explore the questions

Questions help to guide the evaluation. The goal of finding out why some

customers prefer to purchase paper airline tickets rather than e-tickets can be broken down into sub-questions: What are customers’ attitudes to e-tickets? Are they concerned about security? Is the interface for obtaining them poor?

What questions might you ask about the design of a cell phone?

www.id-book.com28

Choose the evaluation approach & methods The evaluation method influences how

data is collected, analyzed and presented.

E.g. field studies typically: Involve observation and interviews. Involve users in natural settings. Do not involve controlled tests Produce qualitative data.

www.id-book.com29

Identify practical issues

For example, how to:

Selecting usersFinding evaluatorsSelecting equipmentStaying on budgetStaying on schedule

www.id-book.com30

www.id-book.com31

Decide about ethical issues

Develop an informed consent form

Participants have a right to:- Know the goals of the study;- Know what will happen to the findings;- Privacy of personal information;- Leave when they wish; - Be treated politely.

Evaluate, interpret & present data Methods used influence how data is

evaluated, interpreted and presented. The following need to be considered:

- Reliability: can the study be replicated?- Validity: is it measuring what you expected?- Biases: is the process creating biases?- Scope: can the findings be generalized?- Ecological validity: is the environment influencing the findings? i.e. Hawthorn effect.

www.id-book.com32

Key points

Many issues to consider before conducting an evaluation study.

These include: goals of the study; involvment or not of users; the methods to use; practical & ethical issues; how data will be collected, analyzed & presented.

The DECIDE framework provides a useful checklist for planning an evaluation study.

www.id-book.com33

www.id-book.com34

An exercise left for you…

Find an evaluation study (proceedings of CHI, CSCW, SOUPS)

Use the DECIDE framework to analyze it. Describe the aspect of DECIDE that are

explicitly addressed in the report and which are not.

On a scale of 1-5, where 1 = poor and 5 = excellent, how would you rate this study?

www.id-book.com35

The aims:

• Explain how to do usability testing• Outline the basics of experimental

design • Describe how to do field studies

www.id-book.com36

Usability testing

Involves recording performance of typical users doing typical tasks.

Controlled settings. Users are observed and timed. Data is recorded on video & key presses are

logged. The data is used to calculate performance

times, and to identify & explain errors. User satisfaction is evaluated using

questionnaires & interviews. Field observations may be used to provide

contextual understanding. www.id-book.com37

Experiments & usability testing Experiments test hypotheses to

discover new knowledge by investigating the relationship between two or more things – i.e., variables.

Usability testing is applied experimentation.

Developers check that the system is usable by the intended user population for their tasks.

Experiments may also be done in usability testing.

www.id-book.com38

Usability testing vs research

Usability testing

Improve products Few participants Results inform design Usually not

completely replicable Conditions controlled

as much as possible Procedure planned Results reported to

developers

Experiments for research

Discover knowledge Many participants Results validated

statistically Must be replicable Strongly controlled

conditions Experimental design Scientific report to

scientific communitywww.id-book.com39

Usability testing

Goals & questions focus on how well users perform tasks with the product.

Comparison of products or prototypes common.

Focus is on time to complete task & number & type of errors.

Data collected by video & interaction logging.

Testing is central. User satisfaction questionnaires &

interviews provide data about users’ opinions.

www.id-book.com40

Usability lab with observers watching a user & assistant

www.id-book.com41

Portable equipment for use in the field

www.id-book.com42

www.id-book.com43

Mobile head-mounted eye tracker

www.id-book.com44

Testing conditions

Usability lab or other controlled space. Emphasis on:

selecting representative users; developing representative tasks.

5-10 users typically selected. Tasks usually last no more than 30

minutes. The test conditions should be the same

for every participant. Informed consent form explains

procedures and deals with ethical issues.www.id-book.com45

Typical metrics

Time to complete a task. Time to complete a task after a specified.

time away from the product. Number and type of errors per task. Number of errors per unit of time. Number of navigations to online help or

manuals. Number of users making a particular

error. Number of users completing task

successfully.www.id-book.com46

Other metrics?

Not everything is performance driven…

www.id-book.com47

Usability engineering orientation

Aim is improvement with each version.

Current level of performance. Minimum acceptable level of

performance. Target level of performance.

www.id-book.com48

How many participants is enough for user testing? The number is a practical issue. Depends on:

schedule for testing; availability of participants; cost of running tests.

Typically 5-10 participants. Some experts argue that testing

should continue until no new insights are gained.

www.id-book.com49

www.id-book.com50

Name 3 features for each that can be tested by usability testing

www.id-book.com51

Neilsen Norman Group study

http://www.nngroup.com/reports/mobile/ipad/ “These reports are based on 2 rounds of

usability studies with real users, reporting how they actually used a broad variety of iPad apps as well as websites accessed on the iPad.”

“The first edition of our iPad report (from 2010) describes the original research we conducted with the first crop of iPad apps immediately after the launch of this tablet device.”

“The second edition of our iPad report (from 2011) describes the follow-up research we conducted a year later, after apps and websites had been given the chance to improve based on the early findings.”www.id-book.com52

Research Goals

Understand how the interactions with the device affected people

Get feedback to their clients and developers

Report on whether the iPad lived up to the hype

Are user expectations different for the iPad compared with the iPhone?

www.id-book.com53

Methodology: Usability Testing

With a think aloud protocol Conducted in Chicago, and Fremont, CA Aim of understanding the typical

usability issues that people encounter when using apps and accessing websites on the iPad

7 participants : Experienced iPhone users (3+ months exp)

who used a variety of apps Varied ages (20’s to 60’s) and occupations 3 male, 4 female

www.id-book.com54

Questions:

Do you think the selection of participants was appropriate? Why?

What problems may occur when asking participants to think out loud as they complete tasks?

www.id-book.com55

The tests

90 minutes of exploring applications on the iPad (self-directed exploration) Researcher sat beside participant,

observed, took notes Session was video recorded

Specific tasks Open a specific app/website Carry out one or more tasks “as if they were

on their own” Balanced presentation order of tasks 60 tasks from 32 different sites

www.id-book.com56

The Equipment

Camera (pointing down at the iPad) recorded the participant’s interactions and gestures when using the iPad Recording streamed to a laptop computer

Webcam used to record the expressions on the participants’ faces and their think aloud commentary

Laptop ran Morae, which synched the two data streams

Up to 3 observers watched the video streams (including the in-room moderator) so that the person’s personal space was not invaded

www.id-book.com57

App or website

iBook Download a free copy of Alice’s Adventures in Wonderland and read through the first few pages.

Craigslist Find some free mulch for your garden.

eBay You want to buy a new iPad on eBay. Find one that you could buy from a reputable seller.

Time Magazine

Browse through the magazine and find the best pictures of the week.

Epicurious You want to make an apple pie for tonight. Find a recipe and see what you need to buy in order to prepare it.

Kayak You are planning a trip to Death Valley in May this year. Find a hotel located in or close to the park.

www.id-book.com58

How representative do you think these tasks are?

Question?

What aspects are considered to be important for good usability and user experience in this case?

www.id-book.com59

Metrics of interest

Usability User Experience Efficient Effective Safe Easy to learn Easy to remember Have good utility

Support creativity Be motivating Be helpful Be satisfying to

www.id-book.com60

Results

Website interactions not optimal Links too small to tap reliably Fonts could be difficult to read

Participants sometimes did not know where to tap on the iPad to select options such as buttons and menus

Got lost (no back button to return to the home page)

Inconsistencies between portrait and lansdcape presentations

www.id-book.com61

Usability Experiments

Predict the relationship between two or more variables.

Independent variable is manipulated by the researcher.

Dependent variable depends on the independent variable.

Typical experimental designs have one or two independent variable.

Validated statistically & replicable.www.id-book.com62

Experimental designs

Between subjects: Different participants - single group of participants is allocated randomly to the experimental conditions.

Within subjects: Same participants - all participants appear in both conditions.

Matched participants - participants are matched in pairs, e.g., based on expertise, gender, etc.

www.id-book.com63

Between, within, matched participant design

Design Advantages Disadvantages

Between No order effects Many subjects & individual differences a problem

Within Few individuals, no individual differences

Counter-balancing needed because of ordering effects

Matched Same as different participants but individual differences reduced

Cannot be sure of perfect matching on all differences

www.id-book.com64

Field studies

Field studies are done in natural settings. “in the wild” is a term for prototypes being

used freely in natural settings. Aim to understand what users do naturally

and how technology impacts them. Field studies are used in product design to:

- identify opportunities for new technology;- determine design requirements; - decide how best to introduce new technology;- evaluate technology in use.

www.id-book.com65

Data collection & analysis Observation & interviews

Notes, pictures, recordings Video Logging

Analyzes Categorized Categories can be provided by theory

Grounded theory Activity theory

www.id-book.com66

Data presentation

The aim is to show how the products are being appropriated and integrated into their surroundings.

Typical presentation forms include: vignettes, excerpts, critical incidents, patterns, and narratives.

www.id-book.com67

Key points 1 Usability testing is done in controlled

conditions. Usability testing is an adapted form of

experimentation. Experiments aim to test hypotheses by

manipulating certain variables while keeping others constant.

The experimenter controls the independent variable(s) but not the dependent variable(s).

www.id-book.com68

Key points 2 There are three types of experimental

design: different-participants, same- participants, & matched participants.

Field studies are done in natural environments.

“In the wild” is a recent term for studies in which a prototype is freely used in a natural setting.

Typically observation and interviews are used to collect field studies data.

Data is usually presented as anecdotes, excerpts, critical incidents, patterns and narratives. www.id-book.com69

Www.id-book.com1 Usability Evaluation. ©2011 The aims Explain the key concepts used in evaluation....

Documents

Remote Usability Evaluation Tool - Virginia Tech...usability evaluation led to the concept of remote usability evaluation. With the Internet as a bridge, remote usability evaluation

Analysis in usability evaluation: Questionnaire

Appropriate Web Usability Evaluation

Usability Evaluation Techniques

Analysis in usability evaluation report 20120416 · a survey study where 155 usability practitioners reported on their latest usability evaluation. Of these, 112 reported on a usability

Soundcloud usability heuristic evaluation

A.R.C. Usability Evaluation

Usability Evaluation - Recommendations

Library website usability evaluation

FINDING USABILITY PROBLEMS THROUGH HEURISTIC EVALUATION usability... · Usability problems, Usability expertise, Discount usability engineering, Telephone-operated interfaces. INTRODUCTION

Usability evaluation on presentation tools

Usability evaluation of e-learning applications,830757/FULLTEXT01.pdfthe results of a usability evaluation of an e-learning system. Appropriate use of usability evaluation methods

Usability Evaluation/LP Usability: how to judge it

A Usability Evaluation Framework for Web Mashup Makers for ... · usability evaluation framework has also been devised. As already indicated, this usability evaluation framework can

Heuristic Evaluation Project Usability Aspect Reportis2470pb/Fall08/UAR/IS2470_UAR_GroupA_Course...Heuristic Evaluation Project Usability Aspect Report ... Heuristic Evaluation Project

Usability and Human Factors Unit 5a Usability Evaluation Methods

Usability Evaluation Report u2

Evaluation of usability tests

018 - Usability Geek - Website Usability Through Automated Usability Evaluation

A Usability Checklist for the Usability Evaluation of ... (by others)/HCI/Evaluation/Ji... · A Usability Checklist for the Usability Evaluation of ... quickly yet comprehensively