Empirically Assessing End User Software Engineering Techniques

Empirically Assessing End User Software Engineering TechniquesGregg Rothermel

Department of Computer Science and EngineeringUniversity of Nebraska -- Lincoln

Questions AddressedHow can we use empirical studies to better understand issues/approaches in end user SE?

What are some of the problems empiricists working on end-user SE face?

What are some of the opportunities for software engineering researchers working in this area?

OutlineBackground on empirical approachesEmpiricism in the end-user SE contextProblems for empiricism in end-user SEConclusion

Empirical Approaches: TypesSurvey interviews or questionnairesControlled Experiment - in the laboratory, involves manipulation of variablesCase Study - observational, often in-situ

Empirical Approaches: SurveysPose questions via interviews or questionnairesProcess: select variables and choose sample, frame questions that relate to variables, collect data, analyze and generalize from dataUses: descriptive (assert characteristics), explanatory (assess why), exploratory (pre-study)Resource: E. Babbie, Survey Research Methods, Wadsworth, 1990

Empirical Approaches:Controlled ExperimentsManipulate independent variables and measure effects on dependent variablesRequires randomization over subjects and objects (partial exception: quasi-experiments)Relies on controlled environment (fix or sample over factors not being manipulated)Often involves a baseline (control group)Supports use of statistical analysesResource: Wohlin et al., Experimentation in Software Engineering, Kluwer, 2000

Empirical Approaches: Case StudiesStudy a phenomenon (process, technique, device) in a specific settingCan involve comparisons between projectsLess control, randomization, and replicabilityEasier to plan than controlled experimentsUses include larger investigations such as longitudinal or industrialResource: R. K. Yin, Case Study Research Design and Methods, Sage Publications, 1994

Empirical Approaches: Comparison

Factor

Survey

Experiment

Case Study

Execution Control

Low

High

Low

Measurement Control

Low

High

High

Investigation Cost

Low

High

Medium

Ease of Replication

High

High

Low

OutlineBackground on empirical studiesEmpiricism in the end-user SE contextProblems for empiricismConclusion

Three Aspects of Empiricism Studies of EUSE (and SE) have two focal pointsThe ability of end users to use devices/processesThe devices and processes themselvesEvaluation and design of devices and processes are intertwined:Summative evaluation helps us assess themFormative evaluation helps us design themWe need families of empirical studies:To generalize resultsStudies inform and motivate further studies

Building Empirical Knowledge through Families of StudiesDomainAnalysesThink-Aloud, FormativeCase Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device

Empirical Studies in WEUSE PapersSurveysScaffidi et al.: usage of abstraction, programming practicesMiller et al.: how users generate names for form fieldsSegal: needs/characteristics of professional end user developersSutcliffe: costs/benefits perceived by users of a web-based content mgmt. systemDomain analysisElbaum et al.: fault types in Matlab programsControlled experimentsFisher et al.: infrastructure support for spreadsheet studies

Example: What You See is What You Test (WYSIWYT)Cell turns more blue (more tested).Testing also flows upstream, marking other affected cells too.At any time, user can check off correct value.

Building Empirical Knowledge of End User SE through Families of StudiesDomainAnalysesThink-Aloud, FormativeCase Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device

Study 1: Effectiveness of DU-adequate test suites (TOSEM 1/01)RQ: Can DU-adequate test suites detect faults more effectively than other types of test suites?Compared DU-adequate vs randomly generated suites of the same size, for ability to detect various seeded faults, across 8 spreadsheetsResult: DU-adequate suites were significantly better than random at detecting faults

Study 2: Usefulness of WYSIWYT (ICSE 6/00)RQs: Are WYSIWYT users more (effective, efficient) than Ad-Hoc?Compared two groups of users, one using WYSIWYT, one not, each on two spreadsheet validation tasksParticipants drawn from Undergraduate Computer Science classesParticipants using WYSIWYT were significantly better at creating DU-adequate suites, with less redundancy in testing

Building Empirical Knowledge of End User SE through Families of StudiesDomainAnalysesThink-Aloud, Formative Case Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device

Study 3: Usefulness of WYSIWYT with End Users (ICSM 11/01)RQs: Are WYSIWYT users more (accurate, active at testing) than Ad-Hoc?Compared two groups of users, one using WYSIWYT, one not, each on two spreadsheet modification tasksParticipants drawn from Undergraduate Business classesParticipants using WYSIWYT were more accurate in making modifications, and did more testing

Study 4: Using Assertions (ICSE 5/03)System can figure out more assertionsUser can enter assertions

Study 4: Using Assertions (ICSE 5/03)RQs: will end users use assertions and do they understand the devices Observed persons as they worked with Forms/3 spreadsheets with assertion facilities provided

Study 4: Using Assertions (ICSE 5/03)Theres got to be something wrong with the formula!

OutlineBackground on empirical studiesEmpiricism in the end-user SE contextProblems for empiricism in end-user SEConclusion

Problems for Empiricism in EUSEThreats to validity factors that limit our ability to draw valid conclusionsExternal: ability to generalizeInternal: ability to correctly infer connections between dependent and independent variablesConstruct: ability of dependent variable to capture the effect being measuredConclusion: ability to apply statistical tests

External ValiditySubjects (participants) arent representativePrograms (objects) arent representativeEnvironments arent representativeProblems are trivial or atypical

Internal ValidityLearning effects, expectation bias, Non-homogeneity among groups (different in experience, training, motivation)Devices or measurement tools faultyTimings are affected by external eventsThe act of observing can change behavior (of users, certainly, but also of artifacts)

Construct ValidityLines of code may not adequately represent amount of work doneTest coverage may not be a valid surrogate for fault detection abilitySuccessful generation of values doesnt guarantee successful use of valuesSelf-grading may not provide an accurate measure of confidence

Conclusion ValiditySmall sample sizesPopulations dont meet requirements for use of statistical testsData distributions dont meet requirements for use of statistical tests

Other ProblemsCost of experimentationDifficulty of finding suitable subjectsDifficulty of finding suitable objectsDifficulty of getting the design right

OutlineBackground on empirical studiesEmpiricism in the end-user SE contextProblems for empiricism in end-user SEConclusion

Questions AddressedHow can we use empirical studies to better understand issues/approaches in end user SE?Via families of appropriate studies, using feedback and replicationWhat are some of the problems empiricists working on end-user SE face?Threats to validity, many particular to this areaCosts, and issues for experiment design/setupWhat are some of the opportunities for software engineering researchers working in this area?Myriad, given the range of study types applicableBetter still with collaboration

Empirically Assessing End User Software Engineering TechniquesGregg Rothermel

Department of Computer Science and EngineeringUniversity of Nebraska -- Lincoln

Can use questionnaires (cheaper) or interviews (allows removal of ambiguity, improves response rate)Less control means harder to assert causalityExecution control: how much control the researcher has over the study. Survey might not be returned, company might stop case study.Measurement control: the degree to which the researcher can decide on measures to be collected (availability of measures in that paradigm.)Factors in selecting study type:Type of research questionLevel of control requiredTime and space of target of studyResources available

Looking at experiments done in this area to date, we find the following points useful.A device is useful only if end users can understand it, are motivated to use it, and find it productive.A device can be productive only if its reasoning techniques are adequate if it meets requirements in terms of effectiveness and efficiency.The goal of euse research is to help end user programmers create more dependable software. There are all sorts of roles for empirical in this. Its not just, as some might think, a matter of evaluating tools.

These form a feedback loop.And indeed there is something wrong with the formula -- the weights do not add up to 100% (wrong constants).And indeed there is something wrong with the formula -- the weights do not add up to 100% (wrong constants).

Documents

Empirically Assessing End User Software Engineering Techniques