May 8, 2007Mohamad Eid User Interface Evaluation Chapter 2

May 8, 2007 Mohamad Eid

User Interface Evaluation

Chapter 2


Outline

• Objectives of User Interface Evaluation• Evaluation Methods • A Preliminary Case Study: Hotel Reservations • Overview of Interface Evaluation Methods • Videotaped Evaluation • Experiments• Cognitive Walkthroughs • Summary


Objectives of User Interface Evaluation

Key objective of both UI design and evaluation:

Minimize malfunctions

Key reason for focusing on evaluation: Without it, the designer would be working “blindfold”

Designers wouldn’t really know whether they are solving customer’s problems in the most productive way



Questions answered by various evaluation techniques:

1. What is the user’s real task? Prevent later malfunctions

by doing evaluation as part of requirements analysis Present and work with a UI

to help formulate the requirements Inappropriate tasks/requirements are a major source of malfunctions

2. What problems do or might users experience with the UI? Directly find malfunctions

3. Which of several alternative UI’s is better? Pick the version that leads to fewer malfunctions

4. Has the UI met usability targets? Ensure that malfunction counts are sufficiently low

5. Does the UI conform to standards? Leverage of collective wisdom to reduce malfunctions


But, in order for evaluation to give feedback to designers...

...we must understand why a malfunction occurs

Malfunction analysis: Determine why a malfunction occurs Determine how to eliminate malfunctions



Evaluation Methods

Formative evaluation: When designing and maintaining software

that we are developing

Summative evaluation: When judging a finished product

developed by someone else


A Preliminary Case Study: Hotel Reservations

UI Evaluation performed for Forte Travelodge Performed in a special usability lab

Aims: Identify and eliminate malfunctions

Hence make system easier to use Avoid business difficulties caused by these malfunctions Develop improved training material and documentation

Avert potential malfunctions by teaching users how to avoid them Setup of IBM usability lab:

Resembles TV studio Microphones and video equipment One way mirror Technicians, observers sit on one side Users sit on other side in realistic environment

User environment resembles reception desk Non-threatening



Aspects of system to be evaluated: How quickly can a booking be made?

(while operator is on telephone) Is each screen productive to use? Are help and error messages effective? Can non-computer-literate operators use the system?

Is complexity minimized? Is training and documentation effective?


Procedure: 15 common task scenarios developed:

Among others: basic registration, cancellation, request for specific room, extension of existing stay etc.

Four days of testing with multiple users performing various sets of tasks Users were told evaluation is of system, not them All actions were recorded Debriefing sessions held

Videos then analyzed for malfunctions 62 identified Priorities:

Navigation speed needs improvement Screen titles and formats need tuning Hard to refer to documentation Physical difficulties with telephone headsets and furniture



Results: Higher productivity of booking staff

tasks completed more quickly guest requirements better met

Training costs kept low Morale kept high

More customers booked by phone

14500 27000 per week



Overview of Interface Evaluation Methods

Three types of methods Passive evaluation

E.g. logs Active evaluation

E.g. Experiments Predictive evaluation / usability inspections

E.g. Heuristics

All types of methods useful for optimal results Used in parallel

All attempt to prevent malfunctions

Before trying methods, do pilot studies first


Passive Evaluation

Usage of software is monitored Performed while prototyping, in alpha test and later Does not actively seek malfunctions

only finds them when they happen to occur infrequent (but possibly severe) malfunctions may not be found

Generally requires realistic use of a system Users become frustrated with malfunctions

Gathering Information:a) Problem report monitoring:

Users should have an easy way to register their frustration / suggestions Best if integrated with software


Passive Evaluation: Gathering Information

b) Automatic software logs Can gather much data about usage

command frequency error frequency and pre-error patterns undone operations (a sign of malfunctions)

Privacy is a concern System must be designed for testability (DFT) Logs can be taken of:

just keystrokes, mouse clicks full details of interaction

The latter make accurate playback easier


Passive Evaluation: Gathering Information

c) Questionnaires / surveys Useful to obtain statistical data from large numbers of

users Proper statistical means are needed to analyze results

Gathers subjective data about importance of malfunction automated logs omit importance less frequent malfunctions may be more important users can prioritize needed improvements

Limit on number of questions Very hard to phrase questions well Questions can be closed- or open-ended


Active Evaluation Actively study specific activities performed by users Performed when prototyping and later

Gathering Information:d) Experiments & usability engineering

Prove hypotheses about measurable attributes of one or more UI’s e.g. speed/learning/accuracy/frustration

In usability engineering test against preset targets

Can be expensive Knowledge of statistics needed Hard to control for all variables

e) Observation sessions Also called ‘interpretive evaluation’ Simple observation or cooperative evaluation Described in detail later


Predictive Evaluation Studies of system by experts rather than users Performed when UI is specified and later

useful even before prototype developed Can eliminate many malfunctions before users ever see software Also called ‘usability inspections’

Gathering Information:f) Heuristic evaluation

Based on a UI design principle document Analyze whether each guideline is adhered to in the context of the task and users Can also look at adherence to standards

g) Cognitive walkthroughs Step-by-step analysis of:

steps in task being performed goals users form to perform these tasks how system leads user through tasks


Summary of Evaluation Techniques

Technique When to use

a) Problem reporting Always

b) Automatic logs In any moderately complex system and whenever there are large numbers and commands

c) Questionnaires Whenever there are large number of users

d) Experiments & Usability Engineering

In special cases where it is hard to choose between alternatives, or when fine tuning

e) Observation sessions Almost always, especially when user has to interact with a client while using the system

f) Heuristic evaluation Always

g) Cognitive Walkthrough When usability must be optimized


Videotaped Evaluation

A software engineer studies users who are actively using the user interface To observe what problems they have

Rather than to measure numbers The sessions are videotaped Can be done in user’s environment

Activities of the user: Performs pre-defined tasks

With or without detailed instructions on how to perform them Preferably talks to herself as if alone in a room

Yields ‘think-aloud protocol’ This process is called ‘co-operative’ evaluation when the software

engineering and user talk to each other


Videotaped Evaluation

The importance of video: Without it, ‘you see what you want to see’

You interpret what you see based on your mental model In the ‘heat of the moment’ you miss many things Minor details (e.g. body language) captured You can repeatedly analyze, looking for different problems

Tips for using video: Several cameras are useful Software is available to help analyse video by dividing into

segments and labelling the segments Evaluation can be time consuming so plan it carefully


Steps for Videotaped Evaluation

1. Select 6 to 8 representative users per user class E.g. client, salesperson, manager, accounts receivable

2. Invite them to individual sessions Sessions should last 30-90 minutes Schedule 4-6 per day

3. If system involves user's clients in the interaction:1. Have users bring important clients2. or have staff pretend to be clients

4. Select facilitators/observers and notetakers5. Prepare tasks:

Select the most commonly used tasks plus a few less important tasks Write task instructions for users Estimate the time it will take to complete each task plus extra time for

discussion6. Prepare notebook or form for organizing notes



7. Set up and test equipment1. Hardware on which to run system2. Audio or video recorder3. Software logs

8. Do a dry run (pilot study)!9. At the Start of an Observation Session

explain: nature of project anticipated user contributions why user's views are important focus is on evaluating the user interface, not evaluating the user all notes, logs, etc., are confidential user can withdraw at any time usage of devices relax!

Sign informed consent form: very important


Steps for Videotaped Evaluation10. Start user verbalizing as they perform each task (thinking aloud)

For co-operative evaluation, software engineer also verbalizes Appropriate questions to be posed by the observing software

engineer:

Question Malfunction if

What do you want to do? They do not know; the system cannot do what they want

What do you think would happen if ..? They do not know; they give wrong answer.

What do you think the system has done? They do not know; they give wrong answer.

What do you think is this information telling you?

They do not know; they give wrong answer.

Why did the system do that? They do not know; they give wrong answer.

What were you expecting to happen? They had no expectation; they were expecting something else.



11. Hold a wrap-up interview (de-briefing)1. What were the most significant problems?

2. What was most difficult to learn?

3. Etc.

12. Analyze the videotape to find malfunctions

Lab exercise: Videotaped evaluation of a software product


Experiments

1. Pick a set of subjects (users)1. A good mix to avoid biases2. A sufficient number to get statistical significance (avoid random

happenings effect results)2. Pick variables to test

Independent: Manipulated to produce different conditions Should not have too many They should not affect each other too much Make sure there are no hidden variables

Dependent: Measured value affected by independent3. Develop a hypothesis

A prediction of the outcome Aim of experiment is to show this is correct E.g. Some change in an independent variable causes some change

in a dependent variable


Experiments4. Design experiments to test hypotheses

Create a null (inverse) hypothesis e. a change in independent variable causes no change in dependent

variable Disprove null hypothesis! Experiment design is difficult.

5. Conduct experiments6. Statistically analyze results to draw conclusions

e.g. using ‘t-tests’ conclusions will be correct within a margin of error 19 times out of

207. Decide what action to take based on conclusions

Def.:A t-test is a statistical tool used to determine whether a significant difference

exists between the means of two distributions or the mean of one distribution and a target value.


Example Experiment: Text Selection Schemes

Early GUI research at Xerox on the Star Workstation Traditional experiments Results were used to develop Macintosh

Goal of study: Evaluate how to select text using the mouse

Steps:

1. Subjects Six groups of four

In each group, only two are experienced in mouse usage



2. Variables Independent:

Selection schemes: 6 strategically chosen patterns involving --> Which mouse button (if any) could be double/triple/quad

clicked to select character/word/sentence --> Which mouse button could be dragged through text --> Which mouse button could adjust the start/end of a selection

Dependent Selection time Selection errors

3. Hypothesis Some scheme is better than all others



4. Detailed experiment design Null hypothesis: No difference in schemes Assign a selection scheme to each group Train the group in their scheme Measure task time and errors

Each subject repeated 6 times A total of 24 tests per scheme

5. Conduct Experiment6. Analysis

t-test used - scheme F found to be significantly better Point and draw through with left mouse Adjust with middle mouse

7. Action Try another combination similar to scheme F Left mouse can be double-clicked


Cognitive Walkthroughs

A form of predictive evaluation Detailed reviews based on psychological theory,

focusing on: Goals a new user must form to execute a task How well the system leads the user to form those goals

i.e. how well the system supports the user

The method is highly structured Forms are provided to guide the evaluator

More time consuming than ordinary heuristic evaluation Less time consuming than experiments


Cognitive Walkthrough Steps1. Choose a task to evaluate2. Describe the task exactly

a) First describe the task in one sentence Use simple language The wording should be from a first-time user’s point of viewe.g. Record a newly-received item in inventory.

b) Describe the initial state of the systeme.g. Main menu is displayed

c) List the atomic actions needed to correctly perform the task, e.g.1. Click on ‘add to inventory’ in the menu.2. If you don’t know the part number, hit ‘return’ to perform look up the part

number, then go to action 4.3. Type the part number into the ‘part number’ field4. Press tab5. Type the number of items in the ‘Number’ field6. Hit <return> or click on ‘Add’.7. If the system prints out a bar-code sticker, affix it to the new item.

d) Describe classes of users who may perform the taske.g. Receiver - knows about inventory, but not yet about the system


Cognitive Walkthrough Steps

e) Describe the ‘Goal Structure’ (or task structure) users would likely have in their minds before starting the task

High-level and system independent Indent sub-goals/subtasks Note if there are actions for which the user has no goals,

the system must stimulate the user to think of these goals by the time they must perform the task

If different classes of user may have different goal structures, list these too.

e.g. Record a received item in inventory Started the inventory programEnter the item



3. For each action specified in step 2c, do the following (I to IV):I. Write down the goal structure

... that the user would need to have in order to perform the action correctlye.g. For action 4

Record a received item in inventory Record the number of items Press tab Enter the number Cause the system to process the transaction



II. Verify that the user will have the correct goal structure given their initial goals given the system’s response to the previous action Estimate the percentage of users who might have each of the

following possible problems: Failure to add goals

e.g. For action 2The system must make it clearly visible that pressing return with nothing entered will invoke a lookup mechanism

Failure to drop goals e.g. The user may have a goal to, notify the person who ordered the

parts This would not be needed if the system performs this automatically

Addition of spurious goals e.g. There may be a field marked ‘Description’

However this only needs to be filled in if the type of item is not in the database



No-progress impasse e.g. After adding an item, the system might just clear the

screen ready for another entry. The user may think the transaction failed (i.e. goal not

achieved) Premature loss of goals

e.g. The user enters an item and hits “return” A message ‘transaction accepted’ is printed (meaning the

transaction has been started) The user powers off the computer thinking the goal is

reached The system never got around to printing the label


Cognitive Walkthrough StepsIII. Verify that the actions match the goals

Possible problems: Correct action doesn’t match goal

e.g. User wants to delete an item that was stolen. Correct action is to select ‘add to inventory’ and specify a negative number System does not help user match the goal to the action

Incorrect actions match goals e.g. User wants to add a new type of item to inventory (for which no items have

yet been received) Upon seeing ‘add to inventory’, user selects this incorrect menu item

IV. Verify that the user can physically perform the action Possible problems: Physical difficulties

e.g. recognizing an icon, holding down shift-ctrl-alt-a to perform a command Time-outs

i.e. running out of time – the system gives up


Summary

Objective of evaluation: Minimize malfunctions Key questions:

What is real task? Problems? Which is better? Met targets? Is it standard? Visibility and feedback

Formative vs. summative evaluation Passive methods

Problem reporting Software logging Questionnaires/surveys


Summary (Cont’d)

Active methods Traditional experiments

Investigate a single UI element Pick subjects Independent and dependent variables Hypotheses Experimental designs:

independent subject matched subject (control for differences among subjects) repeated measures (reuse subjects)

Observation sessions (Videotaped Evaluation) Study active use on realistic tasks Think-aloud protocol on video Co-Operative Evaluation involves dialogue


Summary (Cont’d)

Predictive evaluation: involve experts Cognitive Walkthroughs: goals and actions

Describe task, actions, users, goal structure For each action, verify that users:

... add and drop goals as needed

... don’t add unneeded goals

... can tell when a goal is reached

... don’t drop needed goals

... can see what action to take

... are not mislead into taking wrong action

... have no physical difficulties with action


متشکرم

谢谢

ありがとう

Documents

May 8, 2007Mohamad Eid User Interface Evaluation Chapter 2