Copyright 2006 John Wiley and Sons, Inc. Chapter 7 - Evaluation HCI: Developing Effective Organizational Information Systems Dov Te’eni Jane Carey Ping

Copyright 2006 John Wiley and Sons, Inc.

Chapter 7 - Evaluation

HCI: Developing Effective Organizational Information Systems

Dov Te’eniJane CareyPing Zhang


Road Map

6

Affective

Engineering

9

Organizational

Tasks

4

Physical

Engineering

7

Evaluation

8

Principles &

Guidelines

11

Methodology

12

Relationship, Collaboration,

& Organization

10

Componential

Design

3

Interactive

Technologies

5

Cognitive

Engineering

Context Foundation Application

Additional Context

1

Introduction

2

Org &

Business

Context

13

Social &

Global Issues

14

Changing Needs of IT

Development & Use


Learning Objectives

Explain what evaluation is and why it is important.

Understand the different types of HCI concerns and their rationales.

Understand the relationships of HCI concerns with various evaluations.

Understand usability, usability engineering, and universal usability.


Learning Objectives

Understand different evaluation methods and techniques.

Select appropriate evaluation methods for a particular evaluation need.

Carry out effective and efficient evaluations. Critique reports of studies done by others. Understand the reasons for setting up

industry standards.


Evaluation

Evaluation: the determination of the significance, worth, condition, or value by careful appraisal and study.


EvaluationMetrics

Dialogue Design

Metaphor Design

Analysis

Design

HC

I Principle

s & G

uidelines

Implementation

FormativeEvaluation

SummativeEvaluationCoding

User Needs TestRequirements Determination

Project Selection Project PlanningProject Selection& Planning

Alternative Selection

Media Design

Presentation Design

FormativeEvaluation

FormativeEvaluation

Interface Specification

Task Analysis

User Analysis

Context Analysis

HCI Methodology and Evaluation

What to evaluate? Four levels of HCI concerns

HCI Concern

Description Sample Measure Items

Physical System fits our physical strengths and limitations and does not cause harm to our health

LegibleAudibleSafe to use

Cognitive System fits our cognitive strengths and limitations and functions as the cognitive extension of our brain

Fewer errors and easy recoveryEasy to useEasy to remember how to useEasy to learn

Affective System satisfies our aesthetic and affective needs and is attractive for its own sake

Aesthetically pleasing EngagingTrustworthySatisfyingEnjoyableEntertainingFun

Usefulness Using the system would provide rewarding consequences

Support individual’s tasksCan do some tasks that would not be possible without the systemExtend one’s capability Rewarding


Why evaluate?

The goal of the evaluation is to provide feedback in software development thus suporting an iterative development process (Gould and Lewis 1985).


When to evaluate

Formative Evaluation: conducted during the development of a product in order to form or influence design decisions.

Summative Evaluation: conducted after the product is finished to ensure that it posses certain quality, meets certain standards or satisfies certain requirements set by the sponsors or other agencies.


When to evaluate

Task analysis/Functional analysis

Implementation

Prototyping

Conceptual design/formal design

Requirementsspecification

Evaluation

Figure 7.1 Evaluation as the Center of Systems Development


When to evaluate

Use and Impact Evaluation: conducted during the actual use of the product by real users in real context.

Longitudinal Evaluation: involving the repeated observation or examination of a set of subjects over time with respect to one or more evaluation variables.


Issues in Evaluation

Evaluation Plan Stage of design (early, middle, late) Novelty of product (well defined versus exploratory) Number of expected users Criticality of the interface (e.g., life-critical medical

system versus museum-exhibit support) Costs of product and finances allocated for test Time available Experience of the design and evaluation team


Usability and Usability Engineering

Usability: the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.

Usability and Usability Engineering

Figure 7.2 System Acceptability and Usability


Table 7.4 Nielsen’s Definitions

Usefulness: is the issue whether the system can be used to achieve some desired goal.

Utility: the question of whether the functionality of the system in principle can do what is needed.

Usability: the question of how well users can use that functionality.

Learnability: the system should be easy to learn so that the user can rapidly start getting some work done with the system.

Efficiency: the system should be efficient to use, so that once the user has learned the system, a high level of productivity is possible.


Memorability: the system should be easy to remember, so that the casual user is able to return to the system after some period of not having used it, without having to learn everything all over again.

Errors: the system should have a low error rate, so that users make few errors during the use of the system, and so that if they do make errors they can easily recover from them. Further, catastrophic errors much not occur.

Satisfaction: the system should be pleasant to sue, so that users are subjectively satisfied when using it; they like it.

Table 7.4 Nielsen’s Definitions


Usability Engineering

Usability Engineering: a process through which usability characteristics are specified, quantitatively and early in the development process, and measured throughout the process.


Evaluation Methods

Field strategies

(Settings under conditions as natural as possible)

Respondent strategies

(Settings are muted or made moot)

Field studies

Ethnography and interaction analysis

Contextual inquiry

Judgment studies

Usability inspection methods (e.g. heuristic evaluation)

Field experiments

Beta testing of products

Studies of technological change

Sample surveys

Questionnaires Interviews

Experimental strategies

(Settings concocted for research purposes)

Theoretical strategies

(No observation of behavior required)

Experimental stimulations

Usability testing

Usability engineering

Formal theory

Design theory (e.g. Norman’s 7 stages)

Behavioral theory (e.g. color vision)

Laboratory ExperimentsControlled Experiments

Computer SimulationHuman Information Processing Theory


Analytical Methods

Heuristic Evaluation Heuristics: higher level design principles

when used in practice to guide designs. Heuristics are also called rules-of-thumb.

Heuristic evaluation: a group of experts, guided by a set of higher level design principles or heuristics, evaluate whether interface elements conform to the principles.


Usability Heuristics

Rules Description

Visibility of system status

The system should always keep users informed about what is going on, through appropriate feedback within reasonable time.

Match between system and the real world

The system should speak the users' language, with words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order.

User control and freedom

Users often choose system functions by mistake and will need a clearly marked "emergency exit" to leave the unwanted state without having to go through an extended dialogue. Support undo and redo.

Consistency and standards

Users should not have to wonder whether different words, situations, or actions mean the same thing. Follow platform conventions.

Error prevention Even better than good error messages is a careful design which prevents a problem from occurring in the first place.

Table 7.3 Ten Usability Heuristics

Usability Heuristics

Rules Description

Recognition rather than recall

Make objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate.

Flexibility and efficiency of use

Accelerators -- unseen by the novice user -- may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions.

Aesthetic and minimalist design

Dialogues should not contain information which is irrelevant or rarely needed. Every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility.

Help users recognize, diagnose, and recover from errors

Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution.

Help and documentation Even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on the user's task, list concrete steps to be carried out, and not be too large.

Table 7.6 Ten Usability Heuristics

Eight Golden RulesRules Description

Strive for consistency This rule is the most frequently violated one, but following it can be tricky because there are many forms of consistency. Consistent sequences of actions should be required in similar situations; Identical terminology should be used in prompt, menus, and help screens; Consist color, layout, capitalization, fonts, etc. should be employed throughout. Exceptions, such as required confirmation of the delete command or no echoing of passwords, should be comprehensible and limited in number.

Cater to universal usability

Recognize the needs of diverse users and design for plasticity, facilitating transformation of content. Novice-expert differences, age ranges, disabilities, and technology diversity each enrich the spectrum of requirements that guides design. Adding features for novices, such as explanations and features for expert, such as shortcuts and faster pacing, can enrich the interface design and improve perceived system quality.

Offer informative feedback

For every user action, there should be some system feedback. For frequent and minor actions, the response can be modest, whereas for infrequent and major actions, the response should be more substantial. Visual presentation of the objects of interest provides a convenient environment for showing changes explicitly.

Design dialogs to yield closure

Sequence of actions should be organized into groups with a beginning, middle, and end. Informative feedback at the completion of a group of actions gives operators the satisfaction of accomplishment, a sense of relief, the signal to drop contingency plans from their minds, and a signal to prepare for the next group of actions.

Table 7.7 Eight Golden Rules for User Interface Design

Eight Golden RulesRules Description

Prevent errors As much as possible, design the system so that users cannot make serious errors. If a user makes an error, the interface should detect the error and offer simple, constructive and specific instructions for recovery. Erroneous actions should leave the system state unchanged, or the interface should give instructions about restoring the state.

Permit easy reversal of actions

As much as possible, actions should be reversible. This feature relieves anxiety, since the user knows that errors can be undone, thus encouraging exploration of unfamiliar options. The units of reversibility may be a single action, a data-entry task, or a complete group of actions, such as entry of a name and the address book.

Support internal locus of control

Experienced operators strongly desire the sense that they are in charge of the interface and that the interface responds their actions. Surprising interface actions, tedious sequences of data entries, inability to obtain or difficulty in obtaining necessary information, and inability to produce the action desired all build anxiety and dissatisfaction.

Reduce short-term memory load

The limitation of human information processing in short-term memory requires that displays be kept simple, multiple-page displays be consolidated, window-motion frequency be reduced, and sufficient training time be allotted for codes, mnemonics, and sequences of actions. Where appropriate, online access to command-syntax forms, abbreviations, codes, and other information should be provided.

Table 7.7 Eight Golden Rules for User Interface Design (Shneiderman and Plaisant 2005)


HOMERUN Heuristics for Websites

Description

High-quality content Often updated Minimal download time Ease of use Relevant to users’ needs Unique to the online medium Net-centric corporate culture

Table 7.8 HOMERUN Heuristics for Commercial Websites (Nielsen 2000)


Cognitive Walkthrough

The following steps are involved in cognitive walkthroughs: The characteristics of typical users are identified and

documented and sample tasks are developed that focus on the aspects of the design to be evaluated.

A designer and one or more expert evaluators then come together to do the analysis.

The evaluators walk through the action sequences for each task, placing it within the context of a typical scenario, and as they do this they try to answer the following questions:

Will the correct action be sufficiently evident to the user? Will the user notice that the correct action is available? Will the user associate and interpret the response from

the action correctly?


Cognitive Walkthrough

As the walkthrough is being done, a record of critical information is complied in which the assumptions about what would cause problems and why are recorded. This involves explaining why users would face difficulties. Notes about side issues and design changes are made. A summary of the results is compiled.

The design is then revised to fix the problems presented.


Pluralistic Walkthroughs

Pluralistic walkthroughs are “another type of walkthrough in which users, developers and usability experts work together to step through a task scenario, discussing usability issues associated with dialog elements involved in the scenario steps.” (Nielsen and Mack 1994)


Inspection with Conceptual Frameworks such as the TSSL model Another structured analytical evaluation method is to

use conceptual frameworks as bases for evaluation and inspection. One such framework is the TSSL model we have introduced earlier in the book.


Example 1 - Evaluating option/configuration specification interfaces

Figure 7.3 A Sample Dialog Box

Evaluating option/configuration specification interfaces

Tabs act as a menu for the Dialog

Figure 7.4 A Sample Tabbed Dialog Box



Title Area

Tree menu

Figure 7.5 The Preferences Dialog Box with Tree Menu



Tabbed Drop-Down Menu

Additional Tabs Navigators


Example 2Yahoo, Google, and Lycos web portals and search engines

Compare and contrast displays for top searches of 2003.Which uses color most effectively? Layout? Ease of understanding? Why?


Empirical Methods

Surveys and Questionnaires Used to collect information from a large group of

respondents. Interviews (including focus groups)

Used to collect information from a small key set of respondents.

Experiments Used to determine the best design features from many

options. Field studies

Results are more generalizable since they occur in real settings.

Lifecycle Stage

System Status

Environ. Of Evaluation

Real Users Participation

User Tasks Used

Main Advantage

Main disadvantage

Heuristic evaluation

Any stage; early ones benefit most

Any status (mock up, prototype, final product)

Any None None Finds individual problems. Can address expert user issues

Does not involve real users, thus may not find problems related to real users in real context. Does not link to user's tasks.

Guideline preview


Any status Any None None Finds individual problems.

Does not involve real users. Does not link to user's tasks.

Cognitive walkthrough


Any status Any None Yes, need to identify tasks first

Less expensive.

Does not involve real users. Limited to expert's view.

Table 7.11 Comparison of Evaluation Methods

Lifecycle Stage

System Status

Environ. Of Evaluation

Real Users Participation

User Tasks Used

Main Advantage

Main disadvantage

TSSL based inspection

Any stage Any status

Any None Yes, need to identify tasks first

Direct link to user tasks. Structured with less number of steps to go through.

Does not involve real users. Limited to the tasks identified.

Survey Any stage Any status

Any Yes, a lot Yes or no Finds subjective reactions. Ease to conduct and compare.

Questions need to be well designed. Need large sample.

Interview Task analysis

Mock up, prototype

Any Yes None Flexible, in-depth probing.

Time consuming. Hard to analyze and compare.

Lab controlled experiment

Design, implement, or use

Prototype, final product

Lab Yes. Yes, most time artificially designed to mimic real tasks

Provides fact-based measurements. Results easy to compare.

Requires expensive facility, setup, and expertise.

Field study w/ observation and monitoring

Design, implement, or use

Prototype, final product

Real work setting

Yes None Easy applicable. Reveal user's real tasks. Can highlight difficulties in real use

Observation may affect user behavior

Table 7.11 Comparison of Evaluation Methods


Standards

Standards: are concerned with prescribed ways of discussing, presenting, or doing things to achieve consistency across same type of products.

User Performance/Satisfaction

ProductDevelopment

ProcessLife CycleProcess

Quality in Use

ProductQuality

ProcessQuality

OrganizationalCapability

Figure 7.10 Categories of HCI Related Standards

Sources of Standards

Standards

Information URL

Published ISO standardswww.iso.ch/projects/programme.html

ISO national member bodieswww.iso.ch/addresse/membodies.html

BSI: British Standards Institutewww.bsi.org.uk

ANSI: American National Standards Institute

www.ansi.org

NSSN: A National Resource for Global Standards

www.nssn.org

TRUMP list of HCI and Usability Standards

www.usability.serco.com/trump/resources/standards.htm

Table 7.12 Sources for HCI and Usability Related Standards


Common Industry Format (CIF)

Common Industry Format (CIF): a standard method for reporting summative usability test findings.

The type of information and level of detail that is required in a CIF report is intended to ensure that: Good practice in usability evaluation had been adhered to. There is sufficient information for a usability specialist to judge

the validity of the results. If the test was replicated on the basis of the information given

in the CIF, it should produce essentially the same results. Specific effectiveness and efficiency metrics must be used, Satisfaction must also be measured.


According to NIST, the CIF can be used in the following fashion. For purchased software: Require that suppliers provide usability test reports in CIF

format. Analyze for reliability and applicability. Replicate within agency if required. Use data to select products.

For developed software (in house or subcontract): Define measurable usability goals. Conduct formative usability testing as part of user interface

design activities. Conduct summative usability test using the CIF to ensure

goals have been met.

Common Industry Format (CIF)


Summary

Evaluations are driven by the ultimate concerns of human–computer interaction.

In this chapter, we presented four types of such concerns along the following four dimensions of human needs: agronomical, cognitive, affective, and extrinsic motivational (usefulness).

Evaluations should occur during the entire system development process, after system is finished, and during the period the system is actually used.

This chapter introduced several commonly used evaluation methods. Their pros and cons were compared and discussed.

The chapter also provided several useful instruments and heuristics. Standards play an important role in practice. This is discussed in the chapter. A particular standard, Common Industry Format, is described and the detailed format is given in the appendix.

Documents

Copyright 2006 John Wiley and Sons, Inc. Chapter 7 - Evaluation HCI: Developing Effective Organizational Information Systems Dov Te’eni Jane Carey Ping