35
Empirically Assessing End User Software Engineering Techniques Gregg Rothermel Department of Computer Science and Engineering University of Nebraska -- Lincoln

Empirically Assessing End User Software Engineering Techniques Gregg Rothermel Department of Computer Science and Engineering University of Nebraska --

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Empirically Assessing End User Software Engineering

Techniques

Gregg Rothermel

Department of Computer Science and EngineeringUniversity of Nebraska -- Lincoln

Questions Addressed

• How can we use empirical studies to better understand issues/approaches in end user SE?

• What are some of the problems empiricists working on end-user SE face?

• What are some of the opportunities for software engineering researchers working in this area?

Outline

• Background on empirical approaches• Empiricism in the end-user SE

context• Problems for empiricism in end-user

SE• Conclusion

Outline

• Background on empirical approaches• Empiricism in the end-user SE

context• Problems for empiricism in end-user

SE• Conclusion

Empirical Approaches: Types

• Survey – interviews or questionnaires• Controlled Experiment - in the laboratory,

involves manipulation of variables• Case Study - observational, often in-situ

Empirical Approaches: Surveys

• Pose questions via interviews or questionnaires

• Process: select variables and choose sample, frame questions that relate to variables, collect data, analyze and generalize from data

• Uses: descriptive (assert characteristics), explanatory (assess why), exploratory (pre-study)

Resource: E. Babbie, Survey Research Methods, Wadsworth, 1990

Empirical Approaches:Controlled Experiments

• Manipulate independent variables and measure effects on dependent variables

• Requires randomization over subjects and objects (partial exception: quasi-experiments)

• Relies on controlled environment (fix or sample over factors not being manipulated)

• Often involves a baseline (control group)• Supports use of statistical analyses

Resource: Wohlin et al., Experimentation in Software Engineering, Kluwer, 2000

Empirical Approaches: Case Studies

• Study a phenomenon (process, technique, device) in a specific setting

• Can involve comparisons between projects• Less control, randomization, and

replicability• Easier to plan than controlled experiments• Uses include • larger investigations such as longitudinal or

industrialResource: R. K. Yin, Case Study Research Design and Methods, Sage Publications, 1994

Empirical Approaches: Comparison

Factor Survey Experiment Case Study

Execution Control

Low High Low

Measurement Control

Low High High

Investigation Cost

Low High Medium

Ease of Replication

High High Low

Outline

• Background on empirical studies• Empiricism in the end-user SE

context• Problems for empiricism• Conclusion

Three Aspects of Empiricism

1. Studies of EUSE (and SE) have two focal points

– The ability of end users to use devices/processes– The devices and processes themselves

2. Evaluation and design of devices and processes are intertwined:

– Summative evaluation helps us assess them– Formative evaluation helps us design them

3. We need families of empirical studies:– To generalize results– Studies inform and motivate further studies

DomainAnalyses

Think-Aloud, FormativeCase Studies, Surveys

ControlledExperiments

Controlled Experiments

Summative Case Studies

Exploratory,Theory Dev.

HypothesisTesting

Generalization

Building Empirical Knowledge through Families of Studies

userenvironment,device

DomainAnalyses

Think-Aloud, FormativeCase Studies, Surveys

ControlledExperiments

Controlled Experiments

Summative Case Studies

Exploratory,Theory Dev.

HypothesisTesting

Generalization

Building Empirical Knowledge through Families of Studies

userenvironment,device

Empirical Studies in WEUSE Papers

• Surveys- Scaffidi et al.: usage of abstraction, programming

practices- Miller et al.: how users generate names for form

fields- Segal: needs/characteristics of professional end

user developers- Sutcliffe: costs/benefits perceived by users of a

web-based content mgmt. system• Domain analysis

– Elbaum et al.: fault types in Matlab programs• Controlled experiments

– Fisher et al.: infrastructure support for spreadsheet studies

Cell turns more blue (more “tested”).

Testing also flows upstream, marking other affected cells too.

Example: What You See is What You Test (WYSIWYT)

At any time, user can check off correct value.

DomainAnalyses

Think-Aloud, FormativeCase Studies, Surveys

ControlledExperiments

Controlled Experiments

Summative Case Studies

Exploratory,Theory Dev.

HypothesisTesting

Generalization

Building Empirical Knowledge of End User SE through

Families of Studies

userenvironment,device

Study 1: Effectiveness of DU-adequate test suites (TOSEM

1/01)• RQ: Can DU-adequate test suites detect

faults more effectively than other types of test suites?

• Compared DU-adequate vs randomly generated suites of the same size, for ability to detect various seeded faults, across 8 spreadsheets

• Result: DU-adequate suites were significantly better than random at detecting faults

DomainAnalyses

Think-Aloud, FormativeCase Studies, Surveys

ControlledExperiments

Controlled Experiments

Summative Case Studies

Exploratory,Theory Dev.

HypothesisTesting

Generalization

Building Empirical Knowledge of End User SE through

Families of Studies

userenvironment,device

• RQs: Are WYSIWYT users more (effective, efficient) than Ad-Hoc?

• Compared two groups of users, one using WYSIWYT, one not, each on two spreadsheet validation tasks

• Participants drawn from Undergraduate Computer Science classes

• Participants using WYSIWYT were significantly better at creating DU-adequate suites, with less redundancy in testing

Study 2: Usefulness of WYSIWYT (ICSE 6/00)

DomainAnalyses

Think-Aloud, Formative Case Studies, Surveys

ControlledExperiments

Controlled Experiments

Summative Case Studies

Exploratory,Theory Dev.

HypothesisTesting

Generalization

Building Empirical Knowledge of End User SE through

Families of Studies

userenvironment,device

Study 3: Usefulness of WYSIWYT with End Users (ICSM

11/01)• RQs: Are WYSIWYT users more (accurate,

active at testing) than Ad-Hoc?• Compared two groups of users, one using

WYSIWYT, one not, each on two spreadsheet modification tasks

• Participants drawn from Undergraduate Business classes

• Participants using WYSIWYT were more accurate in making modifications, and did more testing

User can enter assertions

System can figure out more assertions

User can enter assertions

Study 4: Using Assertions (ICSE 5/03)

DomainAnalyses

Think-Aloud, FormativeCase Studies, Surveys

ControlledExperiments

Controlled Experiments

Summative Case Studies

Exploratory,Theory Dev.

HypothesisTesting

Generalization

Building Empirical Knowledge of End User SE through

Families of Studies

userenvironment,device

• RQs: will end users use assertions and do they understand the devices

• Observed persons as they worked with Forms/3 spreadsheets with assertion facilities provided

Study 4: Using Assertions (ICSE 5/03)

There’s got to be something wrong with the formula!

Study 4: Using Assertions (ICSE 5/03)

Outline

• Background on empirical studies• Empiricism in the end-user SE

context• Problems for empiricism in end-user

SE• Conclusion

Problems for Empiricism in EUSE

• Threats to validity – factors that limit our ability to draw valid conclusions– External: ability to generalize– Internal: ability to correctly infer

connections between dependent and independent variables

– Construct: ability of dependent variable to capture the effect being measured

– Conclusion: ability to apply statistical tests

External Validity

• Subjects (participants) aren’t representative• Programs (objects) aren’t representative• Environments aren’t representative• Problems are trivial or atypical

Internal Validity

• Learning effects, expectation bias, …• Non-homogeneity among groups (different

in experience, training, motivation)• Devices or measurement tools faulty• Timings are affected by external events• The act of observing can change behavior

(of users, certainly, but also of artifacts)

Construct Validity

• Lines of code may not adequately represent amount of work done

• Test coverage may not be a valid surrogate for fault detection ability

• Successful generation of values doesn’t guarantee successful use of values

• Self-grading may not provide an accurate measure of confidence

Conclusion Validity

• Small sample sizes• Populations don’t meet requirements for

use of statistical tests• Data distributions don’t meet

requirements for use of statistical tests

Other Problems

• Cost of experimentation• Difficulty of finding suitable subjects• Difficulty of finding suitable objects• Difficulty of getting the design right

Outline

• Background on empirical studies• Empiricism in the end-user SE

context• Problems for empiricism in end-user

SE• Conclusion

Questions Addressed• How can we use empirical studies to better

understand issues/approaches in end user SE?– Via families of appropriate studies, using

feedback and replication• What are some of the problems empiricists

working on end-user SE face?– Threats to validity, many particular to this area– Costs, and issues for experiment design/setup

• What are some of the opportunities for software engineering researchers working in this area?– Myriad, given the range of study types applicable– Better still with collaboration

Empirically Assessing End User Software Engineering

Techniques

Gregg Rothermel

Department of Computer Science and EngineeringUniversity of Nebraska -- Lincoln