33
UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California, Irvine Irvine, California 92697-3425 [email protected] http://www.ics.uci.edu/~dhilbert/

UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

Embed Size (px)

Citation preview

Page 1: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Large-Scale Collection of Application Usage Data to Inform Software

Development

David M. Hilbert

Information and Computer ScienceUniversity of California, IrvineIrvine, California 92697-3425

[email protected]://www.ics.uci.edu/~dhilbert/

Page 2: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Overview• Background and Motivation

• Dissertation and Evaluation

• Insights and Hypotheses

• Progress and Schedule

• Dissertation Outline

• Future Research

Page 3: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Background and Motivation• Expectations influence designs, designs embody

expectations

• Mismatches between expectations and how applications are actually used can lead to breakdowns

• Identification and resolution of mismatches can help improve fit between design and use

• Behavior of applications, users, and usage environments complex and unpredictable enough that observation required

• Research area: theories, methods, techniques to enable large-scale incorporation of application usage data in development

Page 4: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Impact of the Internet• On the positive side

– cheap, rapid, large-scale distribution of software for evaluation

– simple transport mechanism for usage information and feedback

– use and development becoming increasingly concurrent

– should make incorporating usage information easier

• On the negative side– reduces opportunities for traditional user testing

– increases variety and distribution of users and usage situations

– lack of scalable techniques and methods for incorporating usage information on a large scale

Page 5: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Current Approaches• Current approaches suffer from significant limitations

– usability testing => scale (size, scope, location, duration)

– beta testing => data quality (incentives, knowledge, detail)

• The user feedback paradox– users not having problems => provide feedback, negative

reactions

– users having problems => withhold feedback, positive reactions

• The impact assessment problem– impact on user population of suspected or reported

problems and potential changes

Page 6: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Research Goals• Address issues of scale

– enable larger scale evaluations (size, scope, location, duration) than currently possible with existing usability testing techniques

• Address issues of data quality– enable higher quality data to be collected than currently

possible with beta testers alone or existing automated techniques

• Provide a complementary source of information– help address the feedback paradox and impact

assessment problem in making design and effort allocation decisions

Page 7: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Research Direction• Explore the use of automated software monitoring

techniques– capture information about user interactions on a large

scale

– compare actual use against developers’ expectations

– help automate mismatch identification and resolution process

– make incorporating information about users more palatable to developers

Page 8: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Dissertation• Technical issues

– Abstraction Problem (data quality)

– Selection Problem (data quality/scale)

– Context Problem (data quality)

– Reduction Problem (scale)

– Evolution Problem (scale)

• Hypothesis– all these problems can be addressed by embedding the

right kinds of data collection mechanisms within an appropriate data collection architecture

Page 9: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Dissertation (cont’d)• Theoretical/methodological issues

– aside from “technical issues”, it isn’t clear what data to collect and why, and how to incorporate results in development

– since data collection and analysis can be expensive, guidance can increase the chances that the cost/benefit ratio will be favorable

• Hypothesis– a theory and method based on usage expectations can be

elaborated to provide motivation and guidance for incorporating data collection and analysis in development

Page 10: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Contributions• Identification of key issues limiting scalability and data

quality inherent in current techniques

• Solutions to the abstraction, selection, context, reduction, and evolution problems within a single data collection architecture

• A reference architecture to provide design guidance regarding key components and relationships

• Theory to motivate the significance of usage expectations in development and importance of collecting usage information

• Methodological guidance regarding collection, analysis, interpretation, and incorporation of results in development

Page 11: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Evaluation• Prototype

– demonstrate solutions to the abstraction, selection, context, reduction, and evolution problems within a single data collection architecture

• Informal empirical evaluation– assess usability and utility of approach based on feedback

from independent developers who integrated the prototype in a research demonstration scenario

• Participant observation of an industrial project– foundation for an analytical evaluation of the techniques,

reference architecture, theory, and method

Page 12: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

The Abstraction Problem• Observation

– questions about usage typically occur in terms of concepts at higher levels of abstraction than represented in data provided by application components

– questions of usage can occur at multiple levels of abstraction

• Hypothesis– simple “data abstraction” mechanisms (based on

grammatical techniques) can be constructed to allow low-level data to be related to higher-level concepts such as UI and application features as well as users’ tasks and goals

– this can impact the results of human and automated analyses

Page 13: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

The Selection Problem• Observation

– the amount of data necessary to answer usage questions will typically be a relatively small subset of the much larger set of data that might be recorded at any given time

– collecting too much data can make it difficult to separate events and patterns of interest from the “noise”

• Hypothesis– simple “data selection” mechanisms (based on events,

event sequences, values, and value vectors) can be constructed to allow important data to be captured - and unimportant data filtered - prior to reporting

– this can impact the results of human and automated analyses, not to mention scalability

Page 14: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

The Context Problem• Observation

– information required to interpret the significance of events may not be available in the events produced by application components

– contextual information may be spread across multiple events or missing altogether, but is frequently available “for the asking” from the application, artifacts, or user

• Hypothesis

– simple “context-capture” mechanisms (that provide access to application, artifact, and user state information) can be exploited to allow context to be used in interpreting the significance of events

– this can also help in capturing important information not available in events

Page 15: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

The Reduction Problem• Observation

– much of the analysis that will ultimately be performed to answer usage questions can actually be performed during data collection resulting in greatly reduced data reporting and post-hoc analysis needs

– when analysis is left as last step it is often not performed

• Hypothesis

– simple “data reduction” mechanisms (e.g., for performing counts and other simple analyses during collection) can be constructed to reduce the amount of data that must ultimately be reported and analyzed

– this can impact scalability and likelihood that data will be analyzed

Page 16: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

The Evolution Problem• Observation

– data collection needs will typically evolve over time (perhaps due to results of earlier data collection) more rapidly than the application

– unnecessary coupling of data collection and application code can increase cost and even cripple evolution of data collection

• Hypothesis

– “evolvable” data collection mechanisms (based on encapsulating abstraction, selection, context-capture, and reduction decisions) can be constructed to allow data collection to evolve over time without impacting application deployment or use

– this can impact the practicality of performing data collection

Page 17: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Approach• Expectation-Driven Event Monitoring (EDEM)

Page 18: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

EDEM Architecture

Agent Specs saved w/ URL

Development Computer

Java Virtual Machine

EDEMActive Agents

ApplicationUI Components

Top Level Window& UI Events

Property Queries

Property Values HTTPServer

DevelopmentComputer

AgentSpecs

EDEMServer

CollectedData

User Computer

Java Virtual Machine

EDEMActive Agents

ApplicationUI Components

Top Level Window& UI Events

Property Queries

Property Values

Agent Specs loaded via URL

Agent Reports sent via E-mail

Page 19: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Reference Architecture

SystemModel of

UI & App:

ComponentsEvents

PropertiesMethods

O b j O b j O b j

O b j O b j O b j

O b j O b j O b j

O b j

O b j

DataCapture

Abstraction, Selection, Context,

Reduction

DataPackaging

DataAnalysis

DataPrep

DataTransport

AnalystModel of

UI & App:

Features,Dialogs, Controls,

User-Supplied Values, User Tasks

Mapping

Page 20: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Instrumentation intertwined w/ app

Reference Architecture (Word IV)

SystemModel of

UI & App:

ComponentsEvents

PropertiesMethods

O b j O b j O b j

O b j O b j O b j

O b j O b j O b j

O b j

O b j

DataCapture

Abstraction, Selection, Context,

Reduction

DataPackaging

DataAnalysis

DataPrep

DataTransport

AnalystModel of

UI & App:

Features,Dialogs, Controls,

User-Supplied Values, User Tasks

Mapping

Page 21: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Event monitoring infrastructure

TestWizard Database of Office UI

Reference Architecture (Office IV)

SystemModel of

UI & App:

ComponentsEvents

PropertiesMethods

O b j O b j O b j

O b j O b j O b j

O b j O b j O b j

O b j

O b j

DataCapture

Abstraction, Selection, Context,

Reduction

DataPackaging

DataAnalysis

DataPrep

DataTransport

AnalystModel of

UI & App:

Features,Dialogs, Controls,

User-Supplied Values, User Tasks

Mapping

Page 22: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Event monitoring infrastructure

Expectation Agents

Reference Architecture (EDEM)

SystemModel of

UI & App:

ComponentsEvents

PropertiesMethods

O b j O b j O b j

O b j O b j O b j

O b j O b j O b j

O b j

O b j

DataCapture

Abstraction, Selection, Context,

Reduction

DataPackaging

DataAnalysis

DataPrep

DataTransport

AnalystModel of

UI & App:

Features,Dialogs, Controls,

User-Supplied Values, User Tasks

Mapping

“Pluggable” Data Abstraction, Selection, Context-Capture, and

Reduction

Page 23: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Dissertation Progress

Survey

Theory and Method

Reference Architecture

Informal evaluation

Prototype

Participant observation

N/A

Theory and method require further elaboration

Design guidance requires further elaboration

N/A

Prototype requires porting and other extensions

Further analysis of observations required

CommentsProduct

Done

Needs WorkNear Done

Done

Needs Work

Near Done

Status

Page 24: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Dissemination Progress

Conf. Demo

Conf. Demo

Conf. Paper

Work. Paper

Conf. Paper

Journ. PaperJourn. Paper

ICS97

IUI98

ICSE98

CSCW98

Agents98

IEEE TSE

ACM Surveys

X

X

X

X

X

X

PrototypeDescription Venue

X

X

X

X

X

Theory/Method

X

X

Reference Arch.

X

X

Techniques

X

Survey

AcceptedAcceptedAccepted

Accepted

Accepted

In ReviewIn Review

Status

Page 25: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Schedule for Work Remaining

Prototype extension

Theoretical elaborationDocument results

Buffer period

Final defense

port; update event model; explicit support for 5 techniques elaborate theory/method based on “participant observation”should already be well into writing

wrap up any loose ends

schedule ahead of time w/ Grudin

CommentsProduct

Dec-Jan ‘99

Jan-Feb ‘99

Feb ‘99

May-Jul ‘99

May ‘99

Schedule

Page 26: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Dissertation Outline• Introduction (General Introduction)

– Expectations in Software Development (highlight theory)

– Impact of the Internet (problems and opportunities)

– Problems with Current Practice (usability and beta testing)

– Proposed Solution (foreshadow insights, approach, contributions)

• Extracting Usage Data from User Interaction Events (State of the

Art) – Synch and Search

– Abstraction, Filtering, and Recoding

– Counts and Summary Statistics

– Sequence Detection

– Sequence Comparison

– Sequence Characterization

– Visualization

– Integrated Support

Page 27: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Dissertation Outline (cont’d)• Key Problems and Insights (Problem Statement)

– The Abstraction Problem (meaningfulness)

– The Selection Problem (meaningfulness)

– The Context Problem (meaningfulness)

– The Reduction Problem (scalability/practicality)

– The Evolution Problem (scalability/practicality)

– Interdependencies and Interactions

– Need for Theoretical and Methodological Guidance

Page 28: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Dissertation Outline (cont’d)• Expectation-Driven Event Monitoring (Solution Statement)

– Theory and Method (based on research and Microsoft experience)

• Expectations in development

• Identifying expectations

• Integrating data collection in the development process

• Analyzing data and interpreting results

• A sample usage data collection process

– Techniques for Addressing Current Limitations (description of prototype)

• Data Abstraction

• Data Selection

• Context Capture

• Data Reduction

• Evolution

– Reference Architecture (based on prototype and Microsoft experience)

• Architectural components and relationships

• Supporting large-scale data collection

Page 29: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Dissertation Outline (cont’d)• Experience and Evaluation (Evaluation of Solution)

– The GTN scenario

• Study Goals

• Description

• Results

– Participant observation of an industrial project

• Study Goals

• Description

• Results

– Collection, analysis, and reporting goals

– Challenges and limitations (addressed by this research)

– Lessons learned (informing this research)

Page 30: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Dissertation Outline (cont’d)• Conclusions

– Conclusions

– Summary of Contributions

– Future Research

• References

• Appendices

Page 31: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Future Research• Large-scale evaluation of research in practice

– nature of usage information

– issues in interpretation and incorporation of results

– evolution and maintenance issues

• Other possible extensions– exploit relationships between expectations and other

requirements-related artifacts, e.g. use cases, cognitive walkthroughs, task analysis

– explore issues of adaptability and reuse of infrastructure and default analyses

– analysis of changes in usage over time

– analysis of usage involving multiple cooperating users

Page 32: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Other Possible Applications• Support for adaptive UI/application behavior based on long-

term information about user (or users’) actions

• Support for "smarter" delivery of help/suggestions/assistance based on long-term information about user (or users’) actions

• Support for monitoring of other component-based software systems

– low-level data must be related to higher level concepts of interest

– available information exceeds that which can practically be collected

– data collection needs evolve over time more quickly than application

Page 33: UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California,

UCI

Research Process

MicrosoftExperience

Motivation

Insight

Theory/Method

Evaluation

Insight

Prototype

Survey

ReferenceArchitecture

GTNScenario

Motivation

Insight

Evalu

ati

on

Insi

ght

Evalu

ati

on

Insi

ght

Evaluation

Insight

EvaluationInsight