Upload
audrey-tate
View
45
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Skoll: A System for Distributed Continuous Quality Assurance. Atif Memon & Adam Porter University of Maryland {atif,aporter}@cs.umd.edu. Quality Assurance for Large-Scale Systems. Modern systems increasingly complex Run on numerous platform, compiler & library combinations - PowerPoint PPT Presentation
Citation preview
Skoll: A System for Distributed Continuous Quality Assurance
Atif Memon & Adam Porter
University of Maryland
{atif,aporter}@cs.umd.edu
2 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Quality Assurance for Large-Scale Systems
• Modern systems increasingly complex
– Run on numerous platform, compiler & library combinations
– Have 10’s, 100’s, even 1000’s of configuration options
– Are evolved incrementally by geographically-distributed teams
– Run atop of other frequently changing systems
– Have multi-faceted quality objectives
• How do you QA systems like this?
3 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Distributed Continuous Quality Assurance
• QA processes conducted around-the-world, around-the-clock on powerful, virtual computing grids
– Grids can by made up of end-user machines, project-wide resources or dedicated computing clusters
• General Approach
– Divide QA processes into numerous tasks
– Intelligently distribute tasks to clients who then execute them
– Merge and analyze incremental results to efficiently complete desired QA process
• Expected benefits
– Massive parallelization allows more, better & faster QA
– Improved access to resources/environs. not readily found in-house
– Carefully coordinated QA efforts enables more sophisticated analyses
4 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Collaborators
Sandro Fouché, Alan Sussman, Cemal Yilmaz (now at IBM TJ Watson) & Il-Chul Yoon
Alex OrsoGroupGroup Myra Cohen
Murali Haran, Alan Karr, Mike Last, & Ashish Sanil
Doug Schmidt & Andy Gohkale
5 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Skoll DCQA Infrastructure & Approach
See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525.
Clients
6 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Skoll DCQA Infrastructure & Approach
See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525.
Clients
1. Model
7 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Skoll DCQA Infrastructure & Approach
See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525.
Clients
2. Reduce Model
8 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Test ResourcesTest Request
Skoll DCQA Infrastructure & Approach
See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525.
Clients
3. Distribution
9 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Skoll DCQA Infrastructure & Approach
See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525.
Clients
4. Feedback
Test Results
10 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Skoll DCQA Infrastructure & Approach
See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525.
Clients
5. Steering
11 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
The ACE+TAO+CIAO (ATC) System
• ATC characteristics
– 2M+ line open-source CORBA implementation
– maintained by 40+, geographically-distributed developers
– 20,000+ users worldwide
– Product Line Architecture with 500+ configuration options
• runs on dozens of OS and compiler combinations
– Continuously evolving – 200+ CVS commits per week
– Quality concerns include correctness, QoS, footprint, compilation time & more
12 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Options Type Settings
Operating System compile-time {Linux, Windows XP, ….}
TAO_HAS_MINIMUM_CORBA compile-time {True, False}
ORBCollocation runtime {global, per-orb, no}
ORBConnectionPurgingStrategy runtime {lru, lfu, fifo, null}
ACE_version component version {v5.4.3, v5.4.4,…}
TAO_version component version {v1.4.3, v1.4.4,…}
run(ORT/run_test.pl) test case {True, False}
Constraints
(TAO_HAS_AMI) (¬TAO_HAS_MINIMUM_CORBA)
run(ORT/run_test.pl) (¬TAO_HAS_MINIMUM_CORBA)
Define QA Space
13 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Nearest Neighbor Search
14 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Nearest Neighbor Search
15 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Nearest Neighbor Search
16 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Nearest Neighbor Search
17 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
OK ERR-2
0
|CORBA_MESSAGING
AMI_POLLER
AMI_CALLBACK
AMI
OK ERR-1 ERR-3
0
1 0
1
1 0
1
Fault Characterization
• We used machine learning techniques (classification trees) to model option & setting patterns that predict test failures
18 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Applications & Feasibility Studies
• Compatibility testing of component-based systems
• Configuration-level fault characterization
• Test case generation & input space exploration
19 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
See: I. Yoon, A. Sussman, A. Memon & A. Porter, Direct-Dependency-based Software Compatibility Testing. International Conference on Automated Software Engineering, Nov. 2007 (to appear).
Compatibility Testing of Comp-Based Systems
Goal
• Given a component-based system, identify components & their specific versions that fail to build
Solution Approach
• Sample the configuration space, efficiently test this sample & identify subspaces in which compilation & installation fails
– Initial focus on building & installing components. Later work will add functional and performance testing
20 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
The InterComm (IC) Framework
• Middleware for coupling large scientific simulations
– Built from up to 14 other components (e.g., PVM, MPI, GCC, OS)
– Each comp can have several actively maintained versions
– There are complex constraints between components, e.g.,
• Requires GCC version 2.96 or later
• When configured with multiple GNU compilers, all must have the same version number
• When configured with multiple comps that use MPI, all must use the same implementation & version
– http://www.cs.umd.edu/projects/hpsl/chaos/ResearchAreas/ic
• Developers need help to
– Identify working/broken configurations
– Broaden working set (to increase potential user base)
– Rationally manage support activities
21 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Annotated Comp. Dependency Graph
• ACDG = (CDG, Ann)
– CDG: DAG capturing inter-comp deps
– Ann: comp. versions & constraints
• Constraints for each cfg, e.g.,
– ver (gf) = x ver (gcr) = x
– ver (gf) = 4.1.1 ver (gmp) ≥ 4.0
• Can generate cfgs from ACDG
– 3552 total cfgs. Takes up to ~10,700 CPU hrs to build all
22 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Improving Test Execution
• Cfgs often share common build subsequences. This build effort should be reusable across cfgs
• Combine all cfgs into a data structure called a prefix tree
• Execute implied test plan across grid by (1) assigning subpaths to clients, (2) building each subcfg in a VM & caching the VMs to enable reuse
• Example: with 8 machines each able to cache up to 8 VMs exhaustive testing takes up to 355 hours
gcr v4.0.3
gmp v4.2.1 pf/6.2
fc 4.0
23 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Direct-Dependency (DD) Coverage
• Hypothesis: A comp’s build process is most likely to be affected by the comps on which it directly depends
– A directly depends on B iff there is a path (in CDG) from A to B containing no comp nodes
• Sampling approach
– Identify all DDs between every pair of components
– Identify all valid instantiations of these DDs (ver. combs that violate no constraints)
– Select a (small) set of cfgs that cover all valid instantiations of the DDs
24 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Executing the DD Coverage Test Suite
• DD test suite much smaller than exhaustive
– 211 cfgs with 649 comps vs 3552 cfgs with 9919 comps
– For IC, no loss of test eff. (same build failures exposed)
• Speedups achieved using 8 machines w/ 8 VM cache
– Actual case: 2.54 (18 vs 43 hrs)
– Best case: 14.69 (52 vs 355 hrs)
25 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Summary
• Infrastructure in place & working
– Complete client/server implementation using VMware
– Simulator for large scale tests on limited resources
• Initial results promising, but lots of work remains
• Ongoing activities
– Alternative algorithms & test execution policies
– More theoretical study of sampling & test exec approaches
– Apply to more software systems
26 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
See: C. Yilmaz, M. Cohen, A. Porter, Covering Arrays for Efficient Fault Characterization in Complex Configuration Spaces, ISSTA’04, TSE v32 (1)
Configuration-Level Fault Characterization
Goal
• Help developers localize configuration-related faults
Current Solution Approach
• Use covering arrays to sample the cfg space to test for subspaces in which (1) compilation fails or (2) reg. tests fail
• Build models that characterize the configuration options and specific settings that define the failing subspace
27 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
ConfigurationsC1 C2 C3 C4 C5 C6 C7 C8 C9
O1 0 0 0 1 1 1 2 2 2
O2 0 1 2 0 1 2 0 1 2
O3 0 1 2 1 2 0 2 0 1
Covering Arrays
• Compute test schedule from t-way covering arrays
– a set of configurations in which all ordered t-tuples of option settings appear at least once
• 2-way covering array example:
28 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Limitations
• Must choose the strength covering array before computing it
– No way to know, a priori, what the right value is
– Our experience suggests failures patterns can change over time
• Choose too high:
– Run more tests than necessary
– Testing might not finish before next release
• Non-uniform sample negatively affects classification performance
• Choose too low:
– Non-uniform sample negatively affects classification techniques
– Must repeat process at higher strength
29 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Incremental Covering Arrays
• Start with traditional covering array(s) of low strength (usually 2)
• Execute test schedule & classify observed failures
• If resources allow or classification performance requires
– Increment strength
– Build new covering array using previously run array(s) as seeds
See: S. Fouche, M. Cohen and A. Porter. Towards Incremental Adaptive Covering Arrays, ESEC/FSE 2007, (to appear)
30 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Incremental Covering Arrays (cont.)
Multiple CAs at each level of t
• Use t1 as a seed for the first t+1-way array (t+11)
• To create the ith t-way array (ti), create a seed of size = |t-11| using non-seeded cfgs from t-1i
• If |seed| < |t-11| complete seed with cfgs from t+11
31 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
MySQL Case Study
• Project background
– Widely-used, 2M+ line open-source database project
– Cont. evolving & maint. by geographically-distributed developers
– Dozens of cfg opts & runs on dozens of OS/compiler combos
• Case study using release 5.0.24
– Used 13 cfg opts with 2-12 settings each (> 110k unique cfgs)
– 460 tests/per config across a grid of 50 machines
– Executed ~50M tests total using ~25 CPU years
32 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Results
• Built 3 trad and incr covering arrays for 2 t 4
– Traditional sizes : 108, 324, 870
– Incremental sizes: 113, 336 (223), 932 (596)
• Incr appr exposed & classified the same failures as trad appr
• Costs depend on t & failures patterns
– Failures at level t : Inc > Trad (4-9%)
– Failures at level < t : Inc < Trad (65-87%)
– Failures at level > t : Inc < Trad (28-38%)
33 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Summary
• New application driving infrastructure improvements
• Initial results encouraging
– Applied process to a configuration space with over 110K cfgs
– Found many test failures corresponding to real bugs
– Incremental approach more flexible than traditional approach. Appears to offer substantial savings in best case, while incurring minimal cost in worst case
• Ongoing extensions
– MySQL continuous build process
– Community involvement starting
– Want to volunteer? Go to http://www.cs.umd.edu
34 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
GUI Test Case – Executable By A “Robot”
• JFCUnit– Other interactions
– Exponential with length
• Capture/replay– Tedious
• Test “common” sequences
– Bad Idea
• Model-based techniques
– GUITAR– guitar.cs.umd.edu
• JFCUnit– Other interactions
– Exponential with length
• Capture/replay– Tedious
• Test “common” sequences
– Bad Idea
• Model-based techniques
– GUITAR– guitar.cs.umd.edu
35 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
• Test case generation
– Cover all edges
See: Atif M. Memon and Qing Xie, Studying the Fault-Detection Effectiveness of GUI Test Cases for Rapidly Evolving Software. IEEE Transactions on Software Engineering, vol. 31, no. 10, 2005, pp. 884-896.
Follows
Follows
Follows
Follows
Modeling The Event-Interaction SpaceSampling• Event flow graph (EFG)
– Nodes: all GUI events
• Starting events
– Edges: Follows
• relationship
• Reverse Engineering
– Obtained Automatically
36 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Lets See How It Works!
• Point to the CVS head
– Push the button
– Read error report
• What happens
– Gets code from CVS head
– Builds
– Reverse engineers the event-flow graph
– Generates test cases to cover all the edges
• 2-way covering
– Runs them
• SourceForge.net
– Four applications
0
1
2
3
4
5
6
7
8
9
Nu
mb
er
of
Fa
ult
s D
ete
cte
d
CrosswordSage
GanttProject
FreeMind
JMSN
37 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Digging Deeper!
• Intuition
– Non-interacting events (e.g., Save, Find)
– Interacting events (e.g., Copy, Paste)
• Key Idea
– Identify interacting events
– Mark the
EFG edges
(Annotated graph)
– Generate
3-way, 4-way, … covering
test cases for interacting
events only
EFG
38 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Identifying Interacting Events
• High-level overview of approach
– Observe how events execute on the GUI
– Events interact if they influence one another’s execution
• Execute event e2; execute event sequence <e1, e2>
• Did e1 influence e2’s execution?
• If YES, then they must be tested further; annotate the <e1, e2> edge in graph
• Use feedback
– Generate seed suite
• 2-way covering test cases
– Run test cases
• Need to obtain sets of GUI states
– Collect GUI run-time states as feedback
– Analyze feedback and obtain interacting event sets
– Generate new test cases
• 3-way, 4-way, … covering test cases
39 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Did We Do Better?
• Compare feedback-based approach to 2-way coverage
0
1
2
3
4
5
6
7
8
9
Nu
mb
er
of
Fa
ult
s D
ete
cte
d
CrosswordSage
GanttProject
FreeMind
JMSN
40 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Summary
• Manually developed test cases
– JFCUnit, Capture/replay
– Can be deployed and executed by a “robot”
– Too many interactions to test
• Exponential
• The GUITAR Approach
– Develop a model of all possible interactions
– Use abstraction techniques to “sample” the model
• Develop adequacy criteria
– Generate an “initial test suite”; Develop an “Execute tests – collect feedback – annotate model – generate tests” cycle
– Feasibility study & results
41 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Future Work
• Need volunteers for MySQL Build Farm Project
– http://skoll.cs.umd.edu
• Looking for more example systems (help!)
• Continue improving Skoll system
• New problem classes
– Performance and robustness optimization
– Improved use of test data
• Test case ROI analysis
• Configuration advice
• Cost-aware testing (e.g., minimize power, network, disk)
• Use source code analysis to further reduce state spaces
• Extend test generation technology outside GUI applications
• QA for distributed systems
The End