Upload
virgil-farmer
View
242
Download
1
Embed Size (px)
Citation preview
CGCExplorer: A Semi-Automated Search Procedurefor Provably Correct Concurrent Collectors
Martin Vechev Eran Yahav David Bacon
University of Cambridge IBM T.J. Watson Research Center
Noam Rinetzky
Tel Aviv University
Synthesizing Concurrent Algorithms
Designing practical and efficient concurrent algorithms is hard trading off simplicity for performance fine-grained coordination
Result: sub-optimal, buggy algorithms
Need a more structured approach to synthesize correct and optimal implementations out of coarse-grained specificationsSome tasks are best done by machine,while others are best done by human insight;and a properly designed system will find the right balance.– D. Knuth
Synthesizing Concurrent Collectors
Concurrent garbage collectors Widely used Must be correct, but also fast and scalable Many algorithms, not many formal proofs
A challenge problem for verification and synthesis• Concurrency• Heap with no a priori bound
Focus on a specific family of collection algorithms A generalization of Dijkstra’s algorithm Concurrent, Tracing, Non-moving• Single mutator, single collector (non-parallel)
Contributions
Unifying framework – collection algorithms as common skeleton with parametric functions
Trace StepMutator Step
Expose
Mutator Collector
specified various sets of blocks
in 10 cycles
explored 1,600,000 collection algorithms
found 6 correct algorithms
hundreds of variations
Contributions
Overview
High-level design Find a sufficient local invariantFind a sufficient abstraction
Low-level search Verify local invariant
High-level design Find algorithm outlineFind building blocks
Low-level search explore algorithm space
Generation
Verification
Algorithm Space - Counting Algorithms
Track collector’s progress (wavefront)Count pointer installations from behind wavefront
Increment on install, decrement on delete Up to a predetermined counting threshold
expose objects with count > 0 when finished tracing
root
scanned field
object header
1
Collector wavefront
update source field to target objcheck wavefrontif source field behind wavefront - update new target object count - update old target object count
read field valueupdate wavefront (collector progress)mark target object
select objects with count > 0produce new roots
Counting Algorithms: High Level View
Trace StepMutator Step
Expose
Mutator Collector
{ M1: old = source.field M2: w = source.field.WF M3: w new.MC++ M4: w log = log U {new} M5: w old.MC-- M6: source.fld = new}
{ C1: dst = source.field C2: source.field.WF = true C3: mark dst}
{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V}
Trace Step (source, field)Mutator Step (source, field, new)
Set Expose (log)
Coarse-Grained to Fine-Grained Synchronization
What now ?Can we remove atomics ?
Result is incorrect, may lose objects!
atomicatom
ic
atomic
{ M1: old = source.field M2: w = source.field.WF M3: w new.MC++ M4: w log = log U {new} M5: w old.MC-- M6: source.fld = new}
{ C1: dst = source.field C2: source.field.WF = true C3: mark dst}
{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V}
Trace Step (source, field)Mutator Step (source, field, new)
Set Expose (log)
What now ?Can we remove atomics ?
Coarse-Grained to Fine-Grained Synchronization
{ C1: dst = source.field C2: source.field.WF = true C3: mark dst}
{ M1: old = source.field M2: w = source.field.WF M5: w old.MC-- M3: w new.MC++ M4: w log = log U {new} M6: source.fld = new}
{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V}
Trace Step (source, field)Mutator Step (source, field, new)
Set Expose (log)
What now ?Can we remove atomics ?
“When in doubt, use brute force.” --Ken Thompson “When in doubt, use brute force.” --Ken Thompson
Coarse-Grained to Fine-Grained Synchronization
Tracing Step Building BlocksMutator Building Blocks
Expose Building Blocks
M1: old = source.fieldM2: w = source.field.WFM3: w new.MC++M4: w log = log U {new} M5: w old.MC--M6: source.fld = new
C1: dst = source.fieldC3: mark dstC2: source.field.WF = true
E1: o= remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o}
System Input – Building Blocks
Input Constraints
• Mutator blocks: [M3, M4]• Tracing blocks: [C1, C3]• Expose blocks: [ E1, E2, E3, E4 ]
• Dataflow e.g. M2 < M3
System Output – (Verified) Algorithms
Mutator Step (source, field, new)
{ M1: old = source.field
M6: source.fld = new M2: w = source.field.WF
M3: w new.MC++ M4: w log = log U {new}
M5: w old.MC—}
Set Expose(log)
{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o}}
Trace Step (source, field){ C1: dst = source.field C3: mark dst C2: source.field.WF = true }
Explored 306 variations in around 2 mins
Least atomic (verified) algorithm with given blocks
But What Now ?
How do we get further improvement?Need more insightsNeed new building blocks
Example: start and end of collector reading a field
CoordinationMeta-data
Atomicity Ordering
Continuing the Search…
We derived a non-atomic algorithm (at the granularity of blocks) Non atomic write-barrier, collector step and expose System explored over 1,600,000 algorithms (took ~34 hours)
All experiments took ~41 machine hours and ~3 human hours
CGC: Challenge for Automatic Verification
Unbounded heap and sequence of mutations
Checking a global invariant is hard State space too big even for partial checking 3 nodes can quickly consume several GB in the SPIN model checker
Solution • Manually boil down to a local invariant• Automatically prove local invariant Use abstraction - unbounded number of concrete nodes
conservatively represented by small, bounded number of abstract nodes
What Do We Prove?
Want to prove collector safety Retaining all live objects
Local invariant: for every object If an object is referenced from a scanned field at
time of expose, it is either marked, or its count > 0
Show for any arbitrary object, under any arbitrary sequence of mutations
hiddnhiddn
2
root
scanned field
Abstraction Intuition
Select tracked representative objectTrack reference count only for the selected object
object header
wavefront
hiddnhiddn
2
root
Abstraction Intuition
Only up to a fixed number of pointers matter – up to counting threshold• Track these precisely• Forget the rest
scanned field
object header
wavefront
Recap
High-level design Find a sufficient local invariantFind a sufficient abstraction
Low-level search Verify local invariant
High-level design Find algorithm outlineFind building blocks
Low-level search Explore algorithm space
Generation
Verification
Find proof outlineFind proof building blocks
What’s next?
Concurrent Collector Synthesis Get real algorithms Mapping to real machine instructions
• Yet another level of search
Synthesis of other concurrent algorithms In the pipeline – concurrent set algorithms
Local abstractions for concurrent programs
Invited Questions
1) Are your algorithms practical? 2) What are the limitations of this approach?
Would it work for my problem? 3) How do you prove that your algorithms
terminate?4) Can you show another algorithm?5) How do you reduce the number of calls to the
model-checker?6) You didn’t mention any related work7) Can you give more details on experimental
results?
Where Do Building Blocks Come From?
Read/write of heap location, andCollector coordination meta-data
e.g., collector progress, state flags
start_1
start_2
count marked
end_1
end_2
fld_1
fld_2
header
fld_2 start_3 end_36 bits
5 bits
…
1 bit
0 bits
…
start_1
start_2
count marked
end_1
end_2
fld_1
fld_2
header
fld_2 start_3
start_2
count marked
end_1
end_2
fld_1
fld_2
header
fld_2 start_3 end_3
start_1
count marked
fld_1
fld_2
header
fld_2
count marked
fld_1
fld_2
header
fld_2 end_3
count marked
fld_1
fld_2
header
fld_2
Progress Coordination Metadata
Collector Building BlocksMutator Building Blocks
Expose Building Blocks
E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o}
Refined Input – Finer Building Blocks
M1: old = source.field M2s: ws = source.field.WFs M2e: we = source.field.WFe M3s: ws new.MC++ M4s: ws log = log U {new} M5e: we old.MC--M6: source.fld = new
C1: dst = source.fieldC3: mark dst C2s: source.field.WFs = true C2e: source.field.WFe = true
Input Constraints
• Mutator: [ M3s, M4s ]• Tracing: [C1, C3], C2s < [C1, C3] < C2e• Expose: [ E1, E2, E3, E4 ]
• Dataflow: e.g. M2s < M3s
Trace Step (source, field)Mutator Step (source, field, new)
Set expose (log)
{ M1: old = source.field M2e: we = source.field.WFe M6: source.fld = new M2s: ws = source.field.WFs M3s: ws new.MC++ M4s: ws log = log U {new} M5e: we old.MC– }
{ C2s: source.field.WFs = true
C1: dst = source.field C3: mark dst
C2e: source.field.WFe = true}
{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o}
}
System Output
• Constraints = Insights. e.g.:
M2e < M6 < M2s
C2s < C13 < C2e
and.
(Some) Related Work
Superoptimizer: a look at the smallest program, Massalin, ASPLOS’87 Finite state, limited length of instruction sequences
Programming by Sketching, Solar-Lezama et. al., PLDI’05 Finite state
Sketching with Stencils, Solar Leazma et. al., PLDI’07
Automatic discovery of mutual exclusion algorithms, Bar David and Taubenfeld, PODC’03 Finite state
Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms, PLDI’06
CheckFence: Sebastian Burckhardt, Rajeev Alur and Milo M. K. Martin, PLDI’07
…
Algorithm Exploration
lessatomic
moreatomic
differentorders
lessatomic
moreatomic
differentorders
lessatomic
moreatomic
differentorders
Trace StepMutator Step
Expose
Limitations
Need algorithm designer insights Designer needs to understand results of each
phase
Abstraction is tailor-made Designing an abstraction for the next collector?
Pushing the limits of current model-checkers Multiple mutators? Unbounded number of
mutators? Better partial-order reduction may help
Are Your Algorithms Practical?
Are your algorithms correct?Honest answer: not yet
So far focused on correctness more than on performance
However, counting algorithms are of practical interest
The moral is that for the design of multiprocessor installations we cannot rely on the traditional approach of the optimistic engineer, who, when the design looks reasonable, puts it together to see if it works. -- Edsger W.Dijkstra
Experimental Results
Run Total Checked Correct Time (min)
1 306 45 1 2
2 2744 162 2 34
3 12 7 2 1
4 592 146 14 56
5 32 26 1 1
6 3024 550 80 212
7 Timed out
8 6144 127 10 39
9 1624320 1833 6 2072
10 364032 288 0 39
TOTAL 2001206 3184 116 2456
+ About 180 minutes of human working with the system
(3.8 Ghz Xeon processor and 8 Gb memory running version 4 of RedHat Linux.)
Why Does it Work?
Ingredients Relentless optimism Limited setting
Limited Setting single collector, single mutator counting threshold is known algorithm skeleton is fixed algorithm uses a barrier before moving to the
sweep phase … (see paper)
Concurrent Single mutator, single collector (not parallel)
Tracing Computes transitive reachability from roots
Non-Moving Collector does not relocate objects
Algorithm Space - Counting Algorithms
Synthesizing Concurrent Algorithms
Some tasks are best done by machine,while others are best done by human insight;and a properly designed system will find the right balance.– D. Knuth
it seems unavoidable that multiprocessor installations will be built… it seems equally unavoidable that many of them will be put together by aforementioned optimistic engineer. I shudder at the thought of all the new bugs: they will only delight the Devil. Am I too pessimistic? Nobody knows the trouble I have seen...
--Edsger W.Dijkstra