Empirically Assessing End User Software Engineering TechniquesGregg Rothermel
Department of Computer Science and EngineeringUniversity of Nebraska -- Lincoln
Questions AddressedHow can we use empirical studies to better understand issues/approaches in end user SE?
What are some of the problems empiricists working on end-user SE face?
What are some of the opportunities for software engineering researchers working in this area?
OutlineBackground on empirical approachesEmpiricism in the end-user SE contextProblems for empiricism in end-user SEConclusion
OutlineBackground on empirical approachesEmpiricism in the end-user SE contextProblems for empiricism in end-user SEConclusion
Empirical Approaches: TypesSurvey interviews or questionnairesControlled Experiment - in the laboratory, involves manipulation of variablesCase Study - observational, often in-situ
Empirical Approaches: SurveysPose questions via interviews or questionnairesProcess: select variables and choose sample, frame questions that relate to variables, collect data, analyze and generalize from dataUses: descriptive (assert characteristics), explanatory (assess why), exploratory (pre-study)Resource: E. Babbie, Survey Research Methods, Wadsworth, 1990
Empirical Approaches:Controlled ExperimentsManipulate independent variables and measure effects on dependent variablesRequires randomization over subjects and objects (partial exception: quasi-experiments)Relies on controlled environment (fix or sample over factors not being manipulated)Often involves a baseline (control group)Supports use of statistical analysesResource: Wohlin et al., Experimentation in Software Engineering, Kluwer, 2000
Empirical Approaches: Case StudiesStudy a phenomenon (process, technique, device) in a specific settingCan involve comparisons between projectsLess control, randomization, and replicabilityEasier to plan than controlled experimentsUses include larger investigations such as longitudinal or industrialResource: R. K. Yin, Case Study Research Design and Methods, Sage Publications, 1994
Empirical Approaches: Comparison
Factor
Survey
Experiment
Case Study
Execution Control
Low
High
Low
Measurement Control
Low
High
High
Investigation Cost
Low
High
Medium
Ease of Replication
High
High
Low
OutlineBackground on empirical studiesEmpiricism in the end-user SE contextProblems for empiricismConclusion
Three Aspects of Empiricism Studies of EUSE (and SE) have two focal pointsThe ability of end users to use devices/processesThe devices and processes themselvesEvaluation and design of devices and processes are intertwined:Summative evaluation helps us assess themFormative evaluation helps us design themWe need families of empirical studies:To generalize resultsStudies inform and motivate further studies
Building Empirical Knowledge through Families of StudiesDomainAnalysesThink-Aloud, FormativeCase Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device
Building Empirical Knowledge through Families of StudiesDomainAnalysesThink-Aloud, FormativeCase Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device
Empirical Studies in WEUSE PapersSurveysScaffidi et al.: usage of abstraction, programming practicesMiller et al.: how users generate names for form fieldsSegal: needs/characteristics of professional end user developersSutcliffe: costs/benefits perceived by users of a web-based content mgmt. systemDomain analysisElbaum et al.: fault types in Matlab programsControlled experimentsFisher et al.: infrastructure support for spreadsheet studies
Example: What You See is What You Test (WYSIWYT)Cell turns more blue (more tested).Testing also flows upstream, marking other affected cells too.At any time, user can check off correct value.
Building Empirical Knowledge of End User SE through Families of StudiesDomainAnalysesThink-Aloud, FormativeCase Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device
Study 1: Effectiveness of DU-adequate test suites (TOSEM 1/01)RQ: Can DU-adequate test suites detect faults more effectively than other types of test suites?Compared DU-adequate vs randomly generated suites of the same size, for ability to detect various seeded faults, across 8 spreadsheetsResult: DU-adequate suites were significantly better than random at detecting faults
Building Empirical Knowledge of End User SE through Families of StudiesDomainAnalysesThink-Aloud, FormativeCase Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device
Study 2: Usefulness of WYSIWYT (ICSE 6/00)RQs: Are WYSIWYT users more (effective, efficient) than Ad-Hoc?Compared two groups of users, one using WYSIWYT, one not, each on two spreadsheet validation tasksParticipants drawn from Undergraduate Computer Science classesParticipants using WYSIWYT were significantly better at creating DU-adequate suites, with less redundancy in testing
Building Empirical Knowledge of End User SE through Families of StudiesDomainAnalysesThink-Aloud, Formative Case Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device
Study 3: Usefulness of WYSIWYT with End Users (ICSM 11/01)RQs: Are WYSIWYT users more (accurate, active at testing) than Ad-Hoc?Compared two groups of users, one using WYSIWYT, one not, each on two spreadsheet modification tasksParticipants drawn from Undergraduate Business classesParticipants using WYSIWYT were more accurate in making modifications, and did more testing
Study 4: Using Assertions (ICSE 5/03)System can figure out more assertionsUser can enter assertions
Building Empirical Knowledge of End User SE through Families of StudiesDomainAnalysesThink-Aloud, FormativeCase Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device
Study 4: Using Assertions (ICSE 5/03)RQs: will end users use assertions and do they understand the devices Observed persons as they worked with Forms/3 spreadsheets with assertion facilities provided
Study 4: Using Assertions (ICSE 5/03)Theres got to be something wrong with the formula!
OutlineBackground on empirical studiesEmpiricism in the end-user SE contextProblems for empiricism in end-user SEConclusion
Problems for Empiricism in EUSEThreats to validity factors that limit our ability to draw valid conclusionsExternal: ability to generalizeInternal: ability to correctly infer connections between dependent and independent variablesConstruct: ability of dependent variable to capture the effect being measuredConclusion: ability to apply statistical tests
External ValiditySubjects (participants) arent representativePrograms (objects) arent representativeEnvironments arent representativeProblems are trivial or atypical
Internal ValidityLearning effects, expectation bias, Non-homogeneity among groups (different in experience, training, motivation)Devices or measurement tools faultyTimings are affected by external eventsThe act of observing can change behavior (of users, certainly, but also of artifacts)
Construct ValidityLines of code may not adequately represent amount of work doneTest coverage may not be a valid surrogate for fault detection abilitySuccessful generation of values doesnt guarantee successful use of valuesSelf-grading may not provide an accurate measure of confidence
Conclusion ValiditySmall sample sizesPopulations dont meet requirements for use of statistical testsData distributions dont meet requirements for use of statistical tests
Other ProblemsCost of experimentationDifficulty of finding suitable subjectsDifficulty of finding suitable objectsDifficulty of getting the design right
OutlineBackground on empirical studiesEmpiricism in the end-user SE contextProblems for empiricism in end-user SEConclusion
Questions AddressedHow can we use empirical studies to better understand issues/approaches in end user SE?Via families of appropriate studies, using feedback and replicationWhat are some of the problems empiricists working on end-user SE face?Threats to validity, many particular to this areaCosts, and issues for experiment design/setupWhat are some of the opportunities for software engineering researchers working in this area?Myriad, given the range of study types applicableBetter still with collaboration
Empirically Assessing End User Software Engineering TechniquesGregg Rothermel
Department of Computer Science and EngineeringUniversity of Nebraska -- Lincoln
Can use questionnaires (cheaper) or interviews (allows removal of ambiguity, improves response rate)Less control means harder to assert causalityExecution control: how much control the researcher has over the study. Survey might not be returned, company might stop case study.Measurement control: the degree to which the researcher can decide on measures to be collected (availability of measures in that paradigm.)Factors in selecting study type:Type of research questionLevel of control requiredTime and space of target of studyResources available
Looking at experiments done in this area to date, we find the following points useful.A device is useful only if end users can understand it, are motivated to use it, and find it productive.A device can be productive only if its reasoning techniques are adequate if it meets requirements in terms of effectiveness and efficiency.The goal of euse research is to help end user programmers create more dependable software. There are all sorts of roles for empirical in this. Its not just, as some might think, a matter of evaluating tools.
These form a feedback loop.And indeed there is something wrong with the formula -- the weights do not add up to 100% (wrong constants).And indeed there is something wrong with the formula -- the weights do not add up to 100% (wrong constants).