What is the Cost of Determinism? Cedomir Segulja, Tarek S.
Abdelrahman University of Toronto
Slide 2
Source: [Intel]Source: [Youtube]
Slide 3
Same program + same input same output This is bad for Testing
Too many interleaving to test Debugging Hard to debug when behavior
is not repeatable Selling CAD tools users expect each run to
produce the same circuit Non-Determinism
Slide 4
Determinism Is good, but costly What is the fundamental cost of
determinism? What is this cost across various execution
environments? Determinism in the field Deterministic
SchedulersMaximum Slowdown DMP [Devietti et al. 2009]1.7x Kendo
[Olszewski et al. 2009]1.6x Grace [Berger et al. 2009]3.6x CoreDet
[Bergan et al. 2010]10x Calvin [Hower et al. 2011]1.7x RCDC
[Devietti et al. 2011]1.7x Dthreads [Liu et al. 2011]4x Conversion
[Merrield and Eriksson 2013]5x Parrot [Cui et al. 2013]3.8x RFDet
[Lu et al. 2014]2.6x Source: [Bergan et al. 2011] and the
respective papers *Only to show that determinism comes at a cost,
and not to be used for a direct comparison (different features,
benchmarks, # threads, etc.) 1 2
Slide 5
What is Determinism? Property that requires observing the same
output whenever program runs with the same input SyncOrder
determinism [Lu and Scott 11] Require the same program result and
same order of synchronization More flexible than internal
determinism Still greatly eases testing [Cui et al. 13] We assume
data-race-freedom Determinism during debugging is needed But the
cost of determinism matters the most in production All data races
are bugs [Boehm 2008, S. Adve 2010, Marino et al. 2010, Lucia et
al. 2010, ] Data races in general do not help performance [Boehm
12] External SyncOrder Internal
Slide 6
What is the impact of enforcing a fixed synchronization order
on program execution time?
Slide 7
Schedule-Record-Replay Framework application serial dynamic-A
round-robin dynamic-S hybrid scheduler recorder application
replayer schedule thread 1 thread 2 idle architectures small
perturbations background processes DVFS perturber NUMA 12
Slide 8
Replayer Force threads to wait only when absolutely necessary
under the schedule And do so with as little overhead as possible
Non-deterministic execution vs. Non-deterministic execution with
the replayers overhead
Slide 9
Schedules Deterministic SchedulersSchedule Grace [Berger et al.
2009]serial Dthreads [Liu et al. 2011]round-robin Conversion
[Merrield and Eriksson 2013]round-robin Parrot [Cui et al.
2013]round-robin Kendo [Olszewski et al. 2009]dynamic RCDC
[Devietti et al. 2011]dynamic RFDet [Lu et al. 2014]dynamic DMP
[Devietti et al. 2009]hybrid CoreDet [Bergan et al. 2010]hybrid
Calvin [Hower et al. 2011]hybrid When does a thread pass its turn?
At the end serial After each synchronization operation round-robin
After each instruction/store dynamic-A/dynamic-S After N
instructions hybrid N = 100,000 No reduced serial mode
For this set of benchmarks and our platform, and implementation
overhead set aside, the fundamental cost of determinism is
small.
Slide 13
What is the performance cost of insisting on the same schedule
across different environments?
Slide 14
Schedule-Record-Perturb-Replay Framework application serial
dynamic-A round-robin dynamic-S hybrid scheduler recorder
application replayer schedule thread 1 thread 2 idle architectures
small perturbations background processes DVFS perturber NUMA
12
Slide 15
Perturber Small perturbations (context switches, thread
migrations, page faults) Simulate first order effects by inserting
small delays (s and ms) Background processes Spawn additional
threads and control their work to sleep ratio Dynamic voltage and
frequency scaling (DVFS) Use Linuxs cpufreq system to explore
different DVFS policies Non-uniform memory access (NUMA) Spread
threads over two NUMA nodes Asymmetric architectures Use DVFS to
create asymmetry [ Shelepov et al. 2009 ]
Insisting on the same schedule in the presence of skewed
conditions can slow down execution by a factor of almost 2x.
Slide 19
Conclusions Employed the schedule-record-replay framework to
divorce implementation overhead from the fundamental cost of
enforcing deterministic execution Fundamental cost of determinism
is small (4% on avg., 33 % max.) There is room for lowering
overheads in current deterministic systems Measured this
fundamental cost across a range of execution environments The cost
of raises to almost 2x when threads face skewed conditions Do we
need a more relaxed definition of determinism? Quantified various
sources of non-determinism Deterministic logical clocks are not
deterministic (not only due to the performance counters
imperfections [Weaver et al. 2013])