Empirically Assessing End User Software Engineering Techniques

  • Upload
    myrna

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Empirically Assessing End User Software Engineering Techniques. Gregg Rothermel Department of Computer Science and Engineering University of Nebraska -- Lincoln. Questions Addressed. How can we use empirical studies to better understand issues/approaches in end user SE? - PowerPoint PPT Presentation

Citation preview

  • Empirically Assessing End User Software Engineering TechniquesGregg Rothermel

    Department of Computer Science and EngineeringUniversity of Nebraska -- Lincoln

  • Questions AddressedHow can we use empirical studies to better understand issues/approaches in end user SE?

    What are some of the problems empiricists working on end-user SE face?

    What are some of the opportunities for software engineering researchers working in this area?

  • OutlineBackground on empirical approachesEmpiricism in the end-user SE contextProblems for empiricism in end-user SEConclusion

  • OutlineBackground on empirical approachesEmpiricism in the end-user SE contextProblems for empiricism in end-user SEConclusion

  • Empirical Approaches: TypesSurvey interviews or questionnairesControlled Experiment - in the laboratory, involves manipulation of variablesCase Study - observational, often in-situ

  • Empirical Approaches: SurveysPose questions via interviews or questionnairesProcess: select variables and choose sample, frame questions that relate to variables, collect data, analyze and generalize from dataUses: descriptive (assert characteristics), explanatory (assess why), exploratory (pre-study)Resource: E. Babbie, Survey Research Methods, Wadsworth, 1990

  • Empirical Approaches:Controlled ExperimentsManipulate independent variables and measure effects on dependent variablesRequires randomization over subjects and objects (partial exception: quasi-experiments)Relies on controlled environment (fix or sample over factors not being manipulated)Often involves a baseline (control group)Supports use of statistical analysesResource: Wohlin et al., Experimentation in Software Engineering, Kluwer, 2000

  • Empirical Approaches: Case StudiesStudy a phenomenon (process, technique, device) in a specific settingCan involve comparisons between projectsLess control, randomization, and replicabilityEasier to plan than controlled experimentsUses include larger investigations such as longitudinal or industrialResource: R. K. Yin, Case Study Research Design and Methods, Sage Publications, 1994

  • Empirical Approaches: Comparison

    Factor

    Survey

    Experiment

    Case Study

    Execution Control

    Low

    High

    Low

    Measurement Control

    Low

    High

    High

    Investigation Cost

    Low

    High

    Medium

    Ease of Replication

    High

    High

    Low

  • OutlineBackground on empirical studiesEmpiricism in the end-user SE contextProblems for empiricismConclusion

  • Three Aspects of Empiricism Studies of EUSE (and SE) have two focal pointsThe ability of end users to use devices/processesThe devices and processes themselvesEvaluation and design of devices and processes are intertwined:Summative evaluation helps us assess themFormative evaluation helps us design themWe need families of empirical studies:To generalize resultsStudies inform and motivate further studies

  • Building Empirical Knowledge through Families of StudiesDomainAnalysesThink-Aloud, FormativeCase Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device

  • Building Empirical Knowledge through Families of StudiesDomainAnalysesThink-Aloud, FormativeCase Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device

  • Empirical Studies in WEUSE PapersSurveysScaffidi et al.: usage of abstraction, programming practicesMiller et al.: how users generate names for form fieldsSegal: needs/characteristics of professional end user developersSutcliffe: costs/benefits perceived by users of a web-based content mgmt. systemDomain analysisElbaum et al.: fault types in Matlab programsControlled experimentsFisher et al.: infrastructure support for spreadsheet studies

  • Example: What You See is What You Test (WYSIWYT)Cell turns more blue (more tested).Testing also flows upstream, marking other affected cells too.At any time, user can check off correct value.

  • Building Empirical Knowledge of End User SE through Families of StudiesDomainAnalysesThink-Aloud, FormativeCase Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device

  • Study 1: Effectiveness of DU-adequate test suites (TOSEM 1/01)RQ: Can DU-adequate test suites detect faults more effectively than other types of test suites?Compared DU-adequate vs randomly generated suites of the same size, for ability to detect various seeded faults, across 8 spreadsheetsResult: DU-adequate suites were significantly better than random at detecting faults

  • Building Empirical Knowledge of End User SE through Families of StudiesDomainAnalysesThink-Aloud, FormativeCase Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device

  • Study 2: Usefulness of WYSIWYT (ICSE 6/00)RQs: Are WYSIWYT users more (effective, efficient) than Ad-Hoc?Compared two groups of users, one using WYSIWYT, one not, each on two spreadsheet validation tasksParticipants drawn from Undergraduate Computer Science classesParticipants using WYSIWYT were significantly better at creating DU-adequate suites, with less redundancy in testing

  • Building Empirical Knowledge of End User SE through Families of StudiesDomainAnalysesThink-Aloud, Formative Case Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device

  • Study 3: Usefulness of WYSIWYT with End Users (ICSM 11/01)RQs: Are WYSIWYT users more (accurate, active at testing) than Ad-Hoc?Compared two groups of users, one using WYSIWYT, one not, each on two spreadsheet modification tasksParticipants drawn from Undergraduate Business classesParticipants using WYSIWYT were more accurate in making modifications, and did more testing

  • Study 4: Using Assertions (ICSE 5/03)System can figure out more assertionsUser can enter assertions

  • Building Empirical Knowledge of End User SE through Families of StudiesDomainAnalysesThink-Aloud, FormativeCase Studies, SurveysControlledExperimentsControlled ExperimentsSummative Case StudiesExploratory,Theory Dev.HypothesisTestingGeneralizationuserenvironment,device

  • Study 4: Using Assertions (ICSE 5/03)RQs: will end users use assertions and do they understand the devices Observed persons as they worked with Forms/3 spreadsheets with assertion facilities provided

  • Study 4: Using Assertions (ICSE 5/03)Theres got to be something wrong with the formula!

  • OutlineBackground on empirical studiesEmpiricism in the end-user SE contextProblems for empiricism in end-user SEConclusion

  • Problems for Empiricism in EUSEThreats to validity factors that limit our ability to draw valid conclusionsExternal: ability to generalizeInternal: ability to correctly infer connections between dependent and independent variablesConstruct: ability of dependent variable to capture the effect being measuredConclusion: ability to apply statistical tests

  • External ValiditySubjects (participants) arent representativePrograms (objects) arent representativeEnvironments arent representativeProblems are trivial or atypical

  • Internal ValidityLearning effects, expectation bias, Non-homogeneity among groups (different in experience, training, motivation)Devices or measurement tools faultyTimings are affected by external eventsThe act of observing can change behavior (of users, certainly, but also of artifacts)

  • Construct ValidityLines of code may not adequately represent amount of work doneTest coverage may not be a valid surrogate for fault detection abilitySuccessful generation of values doesnt guarantee successful use of valuesSelf-grading may not provide an accurate measure of confidence

  • Conclusion ValiditySmall sample sizesPopulations dont meet requirements for use of statistical testsData distributions dont meet requirements for use of statistical tests

  • Other ProblemsCost of experimentationDifficulty of finding suitable subjectsDifficulty of finding suitable objectsDifficulty of getting the design right

  • OutlineBackground on empirical studiesEmpiricism in the end-user SE contextProblems for empiricism in end-user SEConclusion

  • Questions AddressedHow can we use empirical studies to better understand issues/approaches in end user SE?Via families of appropriate studies, using feedback and replicationWhat are some of the problems empiricists working on end-user SE face?Threats to validity, many particular to this areaCosts, and issues for experiment design/setupWhat are some of the opportunities for software engineering researchers working in this area?Myriad, given the range of study types applicableBetter still with collaboration

  • Empirically Assessing End User Software Engineering TechniquesGregg Rothermel

    Department of Computer Science and EngineeringUniversity of Nebraska -- Lincoln

    Can use questionnaires (cheaper) or interviews (allows removal of ambiguity, improves response rate)Less control means harder to assert causalityExecution control: how much control the researcher has over the study. Survey might not be returned, company might stop case study.Measurement control: the degree to which the researcher can decide on measures to be collected (availability of measures in that paradigm.)Factors in selecting study type:Type of research questionLevel of control requiredTime and space of target of studyResources available

    Looking at experiments done in this area to date, we find the following points useful.A device is useful only if end users can understand it, are motivated to use it, and find it productive.A device can be productive only if its reasoning techniques are adequate if it meets requirements in terms of effectiveness and efficiency.The goal of euse research is to help end user programmers create more dependable software. There are all sorts of roles for empirical in this. Its not just, as some might think, a matter of evaluating tools.

    These form a feedback loop.And indeed there is something wrong with the formula -- the weights do not add up to 100% (wrong constants).And indeed there is something wrong with the formula -- the weights do not add up to 100% (wrong constants).