Copyright 2006 John Wiley & Sons, Inc Chapter 7 – Evaluation HCI: Developing Effective Organizational Information Systems Dov Te’eni Jane Carey Ping Zhang

Copyright 2006 John Wiley & Sons, Inc

Chapter 7 – Evaluation

HCI: Developing Effective Organizational Information Systems

Dov Te’eniJane CareyPing Zhang


Evaluation

Chapter 2 Road Map

6

Affective

Engineering

9

Organizational

Tasks

4

Physical

Engineering

7 8

Principles &

Guidelines

11

Methodology

12

Relationship, Collaboration,

& Organization

10

Componential

Design

3

Interactive

Technologies

5

Cognitive

Engineering

Context Foundation Application

Additional Context

1

Introduction

2

Org &

Business

Context

13

Social &

Global Issues

14

Changing Needs of IT

Development & Use


Learning Objectives

Explain what evaluation is and why it is important. Understand the different types of HCI concerns and their

rationales. Understand the relationships of HCI concerns with various

evaluations. Understand usability, usability engineering, and universal

usability. Understand different evaluation methods and techniques. Select appropriate evaluation methods for a particular

evaluation need Carry out effective and efficient evaluations Critique HCI designs or evaluations done by others Understand the reasons for setting up industry standards


Introduction Evaluation is a general term for the determination of significance, worth,

condition or value of something of interest by careful appraisal and study. Evaluation of IS for organizational and individual use occurs constantly

throughout the entire life cycle of the system. These evaluation can be grouped into two clusters.

First cluster occurs when the system is being developed and prior to release and actual use, that is, during the development stage. This cluster is also called Development Evaluation. The purpose of this evaluation is to test the system for functionality, usability, user experience, and any other aspect before the official release such as bugs, annoyances, and problems.

Second cluster of evaluation is happens when the system is release and is used by targeted users in a real context, that is , during the use and impact stage. This evaluation cluster of evaluation is also called Use and Impact Evaluation. The purpose of this evaluation is to better understand how the system impacts organizational, group, and individual tasks and activities. Such evaluations can further guide and change the design of the future systems.

Many issues and concerns, such as what to evaluate and how to evaluate, are the same or similar to both clusters.


What to Evaluate: Multiple Concerns of HCI

Physical System fits our physical strengths and limitations and does not cause harm to our health

Cognitive System fits our cognitive strengths and limitations and functions as the cognitive extensions of our brain

Affective System satisfiers our aesthetic and affective needs and is attractive for its own sake

Usefulness Using the system would provide rewarding consequences

Table 7.1 Multiple Concern of HCI (Adapted from Zhang et al., 2005)

LegibleAudibleSafe of Use

Fewer errors and easy recoveryEasy to useEasy to remember how to useEasy to learn

Aesthetically pleasingEngagingTrustworthySatisfyingEnjoyableEntertainingFun

Support individual’s taskCan do some tasks that would not bePossible without the systemExtend one’s capabilityRewarding

HCI Concern Description Sample Measure Items


Why to Evaluate?

The goal of evaluation is to provide feedback in system development, thus supporting an iterative development process.

System development is a complex process. A different kind of evaluation feedback is needed to

provide insight on whether the development is moving toward the desired values and significance.

It is also important to understand whether the development and the final product achieved the intended values and significance.


Four main reasons for conducting evaluation (Adapted from Preece et al., 1994)

Understanding the real world. Comparing designs. Engineering toward a target. Checking conformation to a standard.


Evaluation as the center for system development (Adapted from Hix & Hartson, 1993)

Requirement Specification

Task Analysis / Functional Analysis

Conceptual Design

Visual Design

Prototyping

Evaluation

Implementation


When to Evaluate ?

A product is evaluated during its entire life cycle. Therefore, evaluation is basically an ongoing process. Refer to

the previous slide. During the development stage, according to the purpose and

timing, evaluation can classified as formative evaluations and summative evaluation.

Formative evaluations take place during the development of the product in order to form or influence design decisions. They answer the question “What and how to redesign?”

Summative evaluations are conducted after the product is finished (pre-release) to ensure that it possesses certain quality, meets certain standards, or satisfies certain requirements set by the sponsors or other agencies.


When to Evaluate ?

Evaluation should continue after release and during the actual use by the real users in the real contexts. This type of evaluations is is called use and impact evaluations.

There are several reasons the use and impact evaluations: To provide a realistic picture of how users actually react,

adopt, accept, and use the product in the real context. To achieve desired task and organizational performance and

productivity. To know the impact of a software product on users,

organizations, society, and culture.


Issues in Evaluation

Concerns on the objectivity and subjectivity of the evaluation. The human right protection procedures if the human subjects

are involved. Identifying the determinants of an evaluation plan which

including: Stage of design Novelty of product Number of expected users Criticality of the interface Cost of product and finance allocated for test Time available Experience of the design and evaluation team


Usability and usability engineering

The main elements involved: The origin of usability concerns Usability definitions Usability Engineering Universal Usability


The origin of usability concerns

Started from the software crisis that led to the software engineering as a professional discipline.

Through the 1970s, it become clear that an important component of software engineering would be the user interface design.

In 1980s, many nonprofessional become the primary users of the interactive systems, making the demand on easy to use interface much higher than ever.

As end users become more diverse and less technically savvy, interactive systems came to be compared and evaluated with respect to usability – quality of system with respect to ease of learning, use and user satisfaction.


Usability definitions

ISO defines usability as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use”

According to Nelson, he considers the usability is the the part of the system acceptability and defines it as how well users can use the functionality or utility of the system. System acceptability is determined by social acceptability and practical acceptability. The detailed is shown in Figure 7.2


Usability Engineering

A process through which usability characteristics are specified, quantitatively and early in the process, and measured throughout the process.


Universal Usability

Universal Usability will be met when affordable, useful, and usable technology accommodates the vast majority of the global population: this entails addressing challenges of the technology variety, user diversity, gaps in user knowledge in ways only beginning to be acknowledged by educational, corporate, and government agencies.


There are many evaluation method and techniques, and there are different ways of clustering of them.

Backer et al. (1995), had summarized research and evaluation method into 4 types (as depicted in Table 7.5), namely: Field studies, respondent studies, experimental studies and Theoretical Studies.

These evaluations method can be grouped into two categories: Analytical evaluations Empirical evaluations

The main difference between these two method is that analytical evaluation normally do not need collected evidence from users but rely on evaluators using structured approaches for inspections and evaluations, while empirical evaluations draw conclusion based on empirical data, which is can be qualitative and quantitative in nature.

Evaluation Method


Evaluation Method

For each method, the description can be emphasized on six major aspects: Method and key ideas Special framework or model to be used Context or environment Whether to involve users Whether to consider user task, work, or activity The status of the artifact being evaluated (early, late

or finished, or in use)


Analytical Method Heuristic Evaluation

A group of experts, guided by set of higher level design principles or heuristics, evaluate whether interface elements conform to the principles.

The process of heuristic evaluations take place, which consists of the following:

A brief session, in which the evaluator are told what to do. A prepared script is useful as a guide and to ensure that each person receives the same briefing.

The evaluation period, in which each evaluator typically spends one to two hours independently inspecting the product, using the heuristic for guidance.

A debriefing session, in which the experts come together to discuss their findings, remove duplicates, prioritize the problems, and suggest solution.

There are generic heuristics, such as the 10 usability heuristic by Nielsen (Table 7.6) and 8 golden rules by Shneiderman (Table 7.7)

Evaluation Method


Analytical Method Guidelines Review (GR)

GR occur in the design context or environment. They normally do not require using real users. They are conducted by designer or experts outside the design team.

Some GR do consider user task or activities. For example, there are specific guidelines for navigation, data entry, and

getting the user’s attention. These types of evaluations can happen at both early and the late stages of

the development. They can also be used for summative evolution on finished products.

Cognitive Walk-through (CWT) CWT involve simulating a user’s problem-solving process at each step in

the human computer dialog, checking to see if the user’s goal and memory for actions can be assumed to lead to the next correction action. CWT are conducted by evaluation experts, and do not need involve users.

Evaluation Method


Analytical Method CWT involve the following steps:

The characteristic of typical users are identified and documented and sample tasks are developed that focus on the aspects of the design to be evaluated.

A designer and one or more expert evaluators then come together to do the analysis.

The evaluator walk through the action sequences for each task, placing it within the context of a typical scenario, and as they try to answer few question like will the correct action be sufficiently evident to the users and so on.

As the walk-through is being done, a record of critical information is compiled.

The design is then revised to fix the problems presented.

Evaluation Method


Analytical Method Pluralistic Walk-through (PWT)

Steps are involved in PWT Scenarios are developed in the form of a series of hard copy

screens representing a single path through the interface. Often just two or few screens developed

The scenarios are presented to the panel of evaluators and the panelists are asked to write down the sequence of actions they would take to move from one screen to another screen.

When everyone has written down their actions, the panelist discuss the actions that they suggested for the round of the review.

Then the panel moves on to the next round of screen. This process continues until all the scenarios have been evaluated.

Evaluation Method


Analytical Method Inspection with conceptual frameworks such as TSSL

model This a structured analytical evaluation method, that is to use

conceptual frameworks as based for evaluation and inspection. One such framework is the TSSL model as we discussed earlier. This framework can be used to evaluate whether a design us an

effective one. From the the procedure perspective, this method is similar to the

heuristic evaluation. However, this method emphasis starting from identifying user tasks and then evaluate the system from the angle of supporting tasks.

No real user are needed. Example – Evaluating option/configuration specification interface

of two software application, & Evaluating the most popular search results of 2003 by the three top Web Portals and search engine.

Evaluation Method


Analytical Method User model-based analysis , such as

using GOMS model This a method that is used to predict user’s

behavior and performance during interaction with the computer system.

This method has been discussed entirely in the previous chapter (Chapter 5).

Evaluation Method


Empirical Method (EM) EM are normally conducted by involving

users and collecting facts about users interacting with the system.

The commonly method used method include:

Survey/questionnaire Interviews Lab Experiment Observation

Evaluation Method


Empirical Method (EM) Survey/Questionnaire

Surveys are commonly used to collect quantitative data from a large poll of respondents. A survey may focus on opinions or factual data depending on its purpose.

A survey can be conducted in ways such as telephone, e-mail, or mail; they can be paper-based or online.

Advantages: Inexpensive Flexible to conduct Involving a large number of respondents. Allowing anonymity of respondents Providing unbiased understanding if using validated or standard instruments.

Disadvantages: Reliability of survey result Respondent sometime unable to answer questions well especially related to

the pass action. Sometime respondent are not truly represent the indented population because

they selected based on taken of granted. Use a Likert Scale to collect answers on question about a specific concern, such as

opinions, perceptions or belief, attitudes, satisfaction, behavior, or specific assessment.

Evaluation Method


Interviews An interview is “a conversion with a purpose” Interviews can be open-ended (unstructured), semi, or

structured A quick guidelines for developing questions:

Avoid long questions as they hard to remember. Avoid compound sentences/questions by splitting them into 2

questions. That is one sentence is for one idea or question. Avoid using jargon word or language Avoid imposing or implying any bias when presenting the

question to the interviewees.

Evaluation Method


Lab Experiment Are appropriate if evaluators have clear focus…. A lab study involve the following steps:

Develop a research questions Develop theory-driven hypothesis to be tested that outline the

specific relation ship between dependent variable and independent variables.

Design the experiment Pilot test the experiment Recruit subject and take care of the requirements for human as

participants Conduct the experiment, collect and analyze data, draw

conclusion.

Evaluation Method


Observation One way to evaluate a system is to observe and monitor the

real users actually using the system in a real setting. Field study means that the study is conducted in a normal

working environments, or the real filed. Field studies can be valuable in formative evaluations and in

use and impact evaluations. Ethnographic observation attends to actual user experience

in real-world situations with minimum interruption or intervention from the evaluators.

Several actual use episodes can be observed if necessary. Depending on the collected and evolution goals, both

qualitative and quantitative data analyzes may take place.

Evaluation Method


Comparison of Methods Each method has pros and cons and there is no one-size-fits-

all-method. Selecting which method to use is important when planning an

evaluation. Some method are goods for certain HCI (e.g., cognitive), while

some might be appropriate for all for them. Some are good at early stages of development, while others

are good for late stages. Some require the use of real users, while others do not.

Table 7.11 lists the evaluation methods and their characteristic along several dimensions, that also including advantages and disadvantages of each method.

Evaluation Method


Standards

Standards are concerned with prescribed ways of discussing, presenting, or doing things to achieve consistency across the same type of products.

Standards can indicate quality, safety, or merits we know. Standardization makes people’s lives easier and safer.

Standards for software are being developed to prevent the development of poor-quality software that may bring disasters to businesses and people’s lives.

NIST, ISO, IEC and BSI set standards that are used for software products.

Standards are important for summative evaluations of the finished products as what we have been discussed earlier.


Standards

Type of Standards and usability The standards are being categorized based on their purposes. These categorizes are logically related: the objective us for the

product to be effective, efficient, and satisfying when used in the intended contexts.

Please refer to Figure 7.10 for these categorization. Please refer to Table 7.12 for lists of the sources of some

standards.


Standards

Common Industry Format (CIF) Background and the current status

CIF is a standard method for reporting summative usability test finding.

The purpose of the CIF is to encourage incorporation of usability as an element in decision making for software procurements.

The CIF targets two audiences: usability professionals and stakeholders in the organization.

The CIF is designed for usability professionals who generate reports to be used by the stakeholders in order to make decision.


Standards

Common Industry Format (CIF) The CIF Format

The CIF is design for summative testing rather than formative testing.

The CIF format is primarily for reporting results of formal usability tests in which qualitative measurements were collected and is particularly appropriate for summative/comparative testing.

The format includes the following main sections: Executive summary, introduction, method, and results.

Please refer to Appendix A (page 170 in textbook) for a detail of the CIF format report.


Standards

Common Industry Format (CIF) How to use the CIF

According to NIST, the CIF can be used in the following fashion. For purchased software:

Require that suppliers provide usability test report in CIF format

Analyze for reliability and applicability Replicate within agency if required Use data to select product


Summary

Evaluation are driven by the ultimate concerns of HCI.

Evaluations should occurs during the entire system development process, after the system is finished, and during the period the system is actually used.

This chapter introduces several commonly used for evaluation methods. Their pros and cons are compare and discussed.


Summary

Standards play an important role in practice. A particular standard, common industry

format (CIF), is described and detailed format is given in Appendix A.

Some tools and laboratories for HCI studies are briefly described in Appendix B and C.

Documents

Copyright 2006 John Wiley & Sons, Inc Chapter 7 – Evaluation HCI: Developing Effective Organizational Information Systems Dov Te’eni Jane Carey Ping Zhang