View
214
Download
0
Embed Size (px)
Citation preview
Copyright 2006 John Wiley & Sons, Inc
Chapter 7 – Evaluation
HCI: Developing Effective Organizational Information Systems
Dov Te’eniJane CareyPing Zhang
Copyright 2006 John Wiley & Sons, Inc
Evaluation
Chapter 2 Road Map
6
Affective
Engineering
9
Organizational
Tasks
4
Physical
Engineering
7 8
Principles &
Guidelines
11
Methodology
12
Relationship, Collaboration,
& Organization
10
Componential
Design
3
Interactive
Technologies
5
Cognitive
Engineering
Context Foundation Application
Additional Context
1
Introduction
2
Org &
Business
Context
13
Social &
Global Issues
14
Changing Needs of IT
Development & Use
Copyright 2006 John Wiley & Sons, Inc
Learning Objectives
Explain what evaluation is and why it is important. Understand the different types of HCI concerns and their
rationales. Understand the relationships of HCI concerns with various
evaluations. Understand usability, usability engineering, and universal
usability. Understand different evaluation methods and techniques. Select appropriate evaluation methods for a particular
evaluation need Carry out effective and efficient evaluations Critique HCI designs or evaluations done by others Understand the reasons for setting up industry standards
Copyright 2006 John Wiley & Sons, Inc
Introduction Evaluation is a general term for the determination of significance, worth,
condition or value of something of interest by careful appraisal and study. Evaluation of IS for organizational and individual use occurs constantly
throughout the entire life cycle of the system. These evaluation can be grouped into two clusters.
First cluster occurs when the system is being developed and prior to release and actual use, that is, during the development stage. This cluster is also called Development Evaluation. The purpose of this evaluation is to test the system for functionality, usability, user experience, and any other aspect before the official release such as bugs, annoyances, and problems.
Second cluster of evaluation is happens when the system is release and is used by targeted users in a real context, that is , during the use and impact stage. This evaluation cluster of evaluation is also called Use and Impact Evaluation. The purpose of this evaluation is to better understand how the system impacts organizational, group, and individual tasks and activities. Such evaluations can further guide and change the design of the future systems.
Many issues and concerns, such as what to evaluate and how to evaluate, are the same or similar to both clusters.
Copyright 2006 John Wiley & Sons, Inc
What to Evaluate: Multiple Concerns of HCI
Physical System fits our physical strengths and limitations and does not cause harm to our health
Cognitive System fits our cognitive strengths and limitations and functions as the cognitive extensions of our brain
Affective System satisfiers our aesthetic and affective needs and is attractive for its own sake
Usefulness Using the system would provide rewarding consequences
Table 7.1 Multiple Concern of HCI (Adapted from Zhang et al., 2005)
LegibleAudibleSafe of Use
Fewer errors and easy recoveryEasy to useEasy to remember how to useEasy to learn
Aesthetically pleasingEngagingTrustworthySatisfyingEnjoyableEntertainingFun
Support individual’s taskCan do some tasks that would not bePossible without the systemExtend one’s capabilityRewarding
HCI Concern Description Sample Measure Items
Copyright 2006 John Wiley & Sons, Inc
Why to Evaluate?
The goal of evaluation is to provide feedback in system development, thus supporting an iterative development process.
System development is a complex process. A different kind of evaluation feedback is needed to
provide insight on whether the development is moving toward the desired values and significance.
It is also important to understand whether the development and the final product achieved the intended values and significance.
Copyright 2006 John Wiley & Sons, Inc
Four main reasons for conducting evaluation (Adapted from Preece et al., 1994)
Understanding the real world. Comparing designs. Engineering toward a target. Checking conformation to a standard.
Copyright 2006 John Wiley & Sons, Inc
Evaluation as the center for system development (Adapted from Hix & Hartson, 1993)
Requirement Specification
Task Analysis / Functional Analysis
Conceptual Design
Visual Design
Prototyping
Evaluation
Implementation
Copyright 2006 John Wiley & Sons, Inc
When to Evaluate ?
A product is evaluated during its entire life cycle. Therefore, evaluation is basically an ongoing process. Refer to
the previous slide. During the development stage, according to the purpose and
timing, evaluation can classified as formative evaluations and summative evaluation.
Formative evaluations take place during the development of the product in order to form or influence design decisions. They answer the question “What and how to redesign?”
Summative evaluations are conducted after the product is finished (pre-release) to ensure that it possesses certain quality, meets certain standards, or satisfies certain requirements set by the sponsors or other agencies.
Copyright 2006 John Wiley & Sons, Inc
When to Evaluate ?
Evaluation should continue after release and during the actual use by the real users in the real contexts. This type of evaluations is is called use and impact evaluations.
There are several reasons the use and impact evaluations: To provide a realistic picture of how users actually react,
adopt, accept, and use the product in the real context. To achieve desired task and organizational performance and
productivity. To know the impact of a software product on users,
organizations, society, and culture.
Copyright 2006 John Wiley & Sons, Inc
Issues in Evaluation
Concerns on the objectivity and subjectivity of the evaluation. The human right protection procedures if the human subjects
are involved. Identifying the determinants of an evaluation plan which
including: Stage of design Novelty of product Number of expected users Criticality of the interface Cost of product and finance allocated for test Time available Experience of the design and evaluation team
Copyright 2006 John Wiley & Sons, Inc
Usability and usability engineering
The main elements involved: The origin of usability concerns Usability definitions Usability Engineering Universal Usability
Copyright 2006 John Wiley & Sons, Inc
The origin of usability concerns
Started from the software crisis that led to the software engineering as a professional discipline.
Through the 1970s, it become clear that an important component of software engineering would be the user interface design.
In 1980s, many nonprofessional become the primary users of the interactive systems, making the demand on easy to use interface much higher than ever.
As end users become more diverse and less technically savvy, interactive systems came to be compared and evaluated with respect to usability – quality of system with respect to ease of learning, use and user satisfaction.
Copyright 2006 John Wiley & Sons, Inc
Usability definitions
ISO defines usability as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use”
According to Nelson, he considers the usability is the the part of the system acceptability and defines it as how well users can use the functionality or utility of the system. System acceptability is determined by social acceptability and practical acceptability. The detailed is shown in Figure 7.2
Copyright 2006 John Wiley & Sons, Inc
Usability Engineering
A process through which usability characteristics are specified, quantitatively and early in the process, and measured throughout the process.
Copyright 2006 John Wiley & Sons, Inc
Universal Usability
Universal Usability will be met when affordable, useful, and usable technology accommodates the vast majority of the global population: this entails addressing challenges of the technology variety, user diversity, gaps in user knowledge in ways only beginning to be acknowledged by educational, corporate, and government agencies.
Copyright 2006 John Wiley & Sons, Inc
There are many evaluation method and techniques, and there are different ways of clustering of them.
Backer et al. (1995), had summarized research and evaluation method into 4 types (as depicted in Table 7.5), namely: Field studies, respondent studies, experimental studies and Theoretical Studies.
These evaluations method can be grouped into two categories: Analytical evaluations Empirical evaluations
The main difference between these two method is that analytical evaluation normally do not need collected evidence from users but rely on evaluators using structured approaches for inspections and evaluations, while empirical evaluations draw conclusion based on empirical data, which is can be qualitative and quantitative in nature.
Evaluation Method
Copyright 2006 John Wiley & Sons, Inc
Evaluation Method
For each method, the description can be emphasized on six major aspects: Method and key ideas Special framework or model to be used Context or environment Whether to involve users Whether to consider user task, work, or activity The status of the artifact being evaluated (early, late
or finished, or in use)
Copyright 2006 John Wiley & Sons, Inc
Analytical Method Heuristic Evaluation
A group of experts, guided by set of higher level design principles or heuristics, evaluate whether interface elements conform to the principles.
The process of heuristic evaluations take place, which consists of the following:
A brief session, in which the evaluator are told what to do. A prepared script is useful as a guide and to ensure that each person receives the same briefing.
The evaluation period, in which each evaluator typically spends one to two hours independently inspecting the product, using the heuristic for guidance.
A debriefing session, in which the experts come together to discuss their findings, remove duplicates, prioritize the problems, and suggest solution.
There are generic heuristics, such as the 10 usability heuristic by Nielsen (Table 7.6) and 8 golden rules by Shneiderman (Table 7.7)
Evaluation Method
Copyright 2006 John Wiley & Sons, Inc
Analytical Method Guidelines Review (GR)
GR occur in the design context or environment. They normally do not require using real users. They are conducted by designer or experts outside the design team.
Some GR do consider user task or activities. For example, there are specific guidelines for navigation, data entry, and
getting the user’s attention. These types of evaluations can happen at both early and the late stages of
the development. They can also be used for summative evolution on finished products.
Cognitive Walk-through (CWT) CWT involve simulating a user’s problem-solving process at each step in
the human computer dialog, checking to see if the user’s goal and memory for actions can be assumed to lead to the next correction action. CWT are conducted by evaluation experts, and do not need involve users.
Evaluation Method
Copyright 2006 John Wiley & Sons, Inc
Analytical Method CWT involve the following steps:
The characteristic of typical users are identified and documented and sample tasks are developed that focus on the aspects of the design to be evaluated.
A designer and one or more expert evaluators then come together to do the analysis.
The evaluator walk through the action sequences for each task, placing it within the context of a typical scenario, and as they try to answer few question like will the correct action be sufficiently evident to the users and so on.
As the walk-through is being done, a record of critical information is compiled.
The design is then revised to fix the problems presented.
Evaluation Method
Copyright 2006 John Wiley & Sons, Inc
Analytical Method Pluralistic Walk-through (PWT)
Steps are involved in PWT Scenarios are developed in the form of a series of hard copy
screens representing a single path through the interface. Often just two or few screens developed
The scenarios are presented to the panel of evaluators and the panelists are asked to write down the sequence of actions they would take to move from one screen to another screen.
When everyone has written down their actions, the panelist discuss the actions that they suggested for the round of the review.
Then the panel moves on to the next round of screen. This process continues until all the scenarios have been evaluated.
Evaluation Method
Copyright 2006 John Wiley & Sons, Inc
Analytical Method Inspection with conceptual frameworks such as TSSL
model This a structured analytical evaluation method, that is to use
conceptual frameworks as based for evaluation and inspection. One such framework is the TSSL model as we discussed earlier. This framework can be used to evaluate whether a design us an
effective one. From the the procedure perspective, this method is similar to the
heuristic evaluation. However, this method emphasis starting from identifying user tasks and then evaluate the system from the angle of supporting tasks.
No real user are needed. Example – Evaluating option/configuration specification interface
of two software application, & Evaluating the most popular search results of 2003 by the three top Web Portals and search engine.
Evaluation Method
Copyright 2006 John Wiley & Sons, Inc
Analytical Method User model-based analysis , such as
using GOMS model This a method that is used to predict user’s
behavior and performance during interaction with the computer system.
This method has been discussed entirely in the previous chapter (Chapter 5).
Evaluation Method
Copyright 2006 John Wiley & Sons, Inc
Empirical Method (EM) EM are normally conducted by involving
users and collecting facts about users interacting with the system.
The commonly method used method include:
Survey/questionnaire Interviews Lab Experiment Observation
Evaluation Method
Copyright 2006 John Wiley & Sons, Inc
Empirical Method (EM) Survey/Questionnaire
Surveys are commonly used to collect quantitative data from a large poll of respondents. A survey may focus on opinions or factual data depending on its purpose.
A survey can be conducted in ways such as telephone, e-mail, or mail; they can be paper-based or online.
Advantages: Inexpensive Flexible to conduct Involving a large number of respondents. Allowing anonymity of respondents Providing unbiased understanding if using validated or standard instruments.
Disadvantages: Reliability of survey result Respondent sometime unable to answer questions well especially related to
the pass action. Sometime respondent are not truly represent the indented population because
they selected based on taken of granted. Use a Likert Scale to collect answers on question about a specific concern, such as
opinions, perceptions or belief, attitudes, satisfaction, behavior, or specific assessment.
Evaluation Method
Copyright 2006 John Wiley & Sons, Inc
Interviews An interview is “a conversion with a purpose” Interviews can be open-ended (unstructured), semi, or
structured A quick guidelines for developing questions:
Avoid long questions as they hard to remember. Avoid compound sentences/questions by splitting them into 2
questions. That is one sentence is for one idea or question. Avoid using jargon word or language Avoid imposing or implying any bias when presenting the
question to the interviewees.
Evaluation Method
Copyright 2006 John Wiley & Sons, Inc
Lab Experiment Are appropriate if evaluators have clear focus…. A lab study involve the following steps:
Develop a research questions Develop theory-driven hypothesis to be tested that outline the
specific relation ship between dependent variable and independent variables.
Design the experiment Pilot test the experiment Recruit subject and take care of the requirements for human as
participants Conduct the experiment, collect and analyze data, draw
conclusion.
Evaluation Method
Copyright 2006 John Wiley & Sons, Inc
Observation One way to evaluate a system is to observe and monitor the
real users actually using the system in a real setting. Field study means that the study is conducted in a normal
working environments, or the real filed. Field studies can be valuable in formative evaluations and in
use and impact evaluations. Ethnographic observation attends to actual user experience
in real-world situations with minimum interruption or intervention from the evaluators.
Several actual use episodes can be observed if necessary. Depending on the collected and evolution goals, both
qualitative and quantitative data analyzes may take place.
Evaluation Method
Copyright 2006 John Wiley & Sons, Inc
Comparison of Methods Each method has pros and cons and there is no one-size-fits-
all-method. Selecting which method to use is important when planning an
evaluation. Some method are goods for certain HCI (e.g., cognitive), while
some might be appropriate for all for them. Some are good at early stages of development, while others
are good for late stages. Some require the use of real users, while others do not.
Table 7.11 lists the evaluation methods and their characteristic along several dimensions, that also including advantages and disadvantages of each method.
Evaluation Method
Copyright 2006 John Wiley & Sons, Inc
Standards
Standards are concerned with prescribed ways of discussing, presenting, or doing things to achieve consistency across the same type of products.
Standards can indicate quality, safety, or merits we know. Standardization makes people’s lives easier and safer.
Standards for software are being developed to prevent the development of poor-quality software that may bring disasters to businesses and people’s lives.
NIST, ISO, IEC and BSI set standards that are used for software products.
Standards are important for summative evaluations of the finished products as what we have been discussed earlier.
Copyright 2006 John Wiley & Sons, Inc
Standards
Type of Standards and usability The standards are being categorized based on their purposes. These categorizes are logically related: the objective us for the
product to be effective, efficient, and satisfying when used in the intended contexts.
Please refer to Figure 7.10 for these categorization. Please refer to Table 7.12 for lists of the sources of some
standards.
Copyright 2006 John Wiley & Sons, Inc
Standards
Common Industry Format (CIF) Background and the current status
CIF is a standard method for reporting summative usability test finding.
The purpose of the CIF is to encourage incorporation of usability as an element in decision making for software procurements.
The CIF targets two audiences: usability professionals and stakeholders in the organization.
The CIF is designed for usability professionals who generate reports to be used by the stakeholders in order to make decision.
Copyright 2006 John Wiley & Sons, Inc
Standards
Common Industry Format (CIF) The CIF Format
The CIF is design for summative testing rather than formative testing.
The CIF format is primarily for reporting results of formal usability tests in which qualitative measurements were collected and is particularly appropriate for summative/comparative testing.
The format includes the following main sections: Executive summary, introduction, method, and results.
Please refer to Appendix A (page 170 in textbook) for a detail of the CIF format report.
Copyright 2006 John Wiley & Sons, Inc
Standards
Common Industry Format (CIF) How to use the CIF
According to NIST, the CIF can be used in the following fashion. For purchased software:
Require that suppliers provide usability test report in CIF format
Analyze for reliability and applicability Replicate within agency if required Use data to select product
Copyright 2006 John Wiley & Sons, Inc
Summary
Evaluation are driven by the ultimate concerns of HCI.
Evaluations should occurs during the entire system development process, after the system is finished, and during the period the system is actually used.
This chapter introduces several commonly used for evaluation methods. Their pros and cons are compare and discussed.