Upload
marten-range
View
196
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Herb Sutter (GotW.ca) says that the concept of Concurrency is easier understood if split into three sub concepts; scalability, responsiveness and consistency. This presentation is the first of three covering these concepts, starting off with everyone’s favorite: Scalability – i.e. splitting a CPU-bound problem onto several cores in order to solve the problem faster. I will show what tools what .NET offer but also performance pitfalls that arise from an escalating problem that plagued computer architecture for the last 20 years.
Citation preview
Mårten RångeWCOM AB
@marten_range
ConcurrencyExamples for .NET
Responsive
PerformanceScalable algorithms
Three pillars of Concurrency
Scalability (CPU) Parallel.For
Responsiveness Task/Future async/await
Consistency lock/synchronized Interlocked.* Mutex/Event/Semaphore Monitor
Scalability
Which is fastest?
var ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){ ints[inner] = random.Next();}// ------------------------------------------------var ints = new int[InnerLoop];var random = new Random();Parallel.For( 0, InnerLoop, i => ints[i] = random.Next() );
SHARED STATE Race condition
var ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){ ints[inner] = random.Next();}// ------------------------------------------------var ints = new int[InnerLoop];var random = new Random();Parallel.For( 0, InnerLoop, i => ints[i] = random.Next() );
SHARED STATE Poor performancevar ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){ ints[inner] = random.Next();}// ------------------------------------------------var ints = new int[InnerLoop];var random = new Random();Parallel.For( 0, InnerLoop, i => ints[i] = random.Next() );
Then and now
Metric VAX-11/750 (’80)
Today Improvement
MHz 6 3300 550x
Memory MB 2 16384 8192x
Memory MB/s 13 R ~10000W ~2500
770x190x
Then and now
Metric VAX-11/750 (’80)
Today Improvement
MHz 6 3300 550x
Memory MB 2 16384 8192x
Memory MB/s 13 R ~10000W ~2500
770x190x
Memory nsec 225 70 3x
Then and now
Metric VAX-11/750 (’80)
Today Improvement
MHz 6 3300 550x
Memory MB 2 16384 8192x
Memory MB/s 13 R ~10000W ~2500
770x190x
Memory nsec 225 70 3x
Memory cycles
1.4 210 -150x
299,792,458 m/s
Speed of light is too slow
0.09 m/c
99% - latency mitigation
1% - computation
2 Core CPU
RAM
L3L2
L1
CPU1
L2
L1
CPU2
2 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
new Random ()
new int[InnerLoop]
2 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
Random object Random object
2 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
Random object Random object
2 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
Random objectRandom object
2 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
Random objectRandom object
2 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
Random objectRandom object
2 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
Random objectRandom object
4 Core CPU – L1 Cache
L1
CPU1
L1
CPU2
L1
CPU3
L1
CPU4
new Random ()
new int[InnerLoop]
2x4 Core CPU
RAM
L3L2
L1
CPU1
L2
L1
CPU2
L2
L1
CPU3
L2
L1
CPU4
L3L2
L1
CPU5
L2
L1
CPU6
L2
L1
CPU7
L2
L1
CPU8
Solution 1 – Locks
var ints = new int[InnerLoop];var random = new Random();Parallel.For( 0, InnerLoop, i => {lock (ints) {ints[i] = random.Next();}} );
Solution 2 – No sharing
var ints = new int[InnerLoop];Parallel.For( 0, InnerLoop, () => new Random(), (i, pls, random) => {ints[i] = random.Next(); return random;}, random => {} );
Parallel.For adds overheadLevel0
Level1
Level2
ints[0]
ints[1]
Level2
ints[2]
ints[3]
Level1
Level2
ints[4]
ints[5]
Level2
ints[6]
ints[7]
Solution 3 – Less overhead
var ints = new int[InnerLoop];Parallel.For( 0, InnerLoop / Modulus, () => new Random(), (i, pls, random) => { var begin = i * Modulus ; var end = begin + Modulus ; for (var iter = begin; iter < end; ++iter) { ints[iter] = random.Next(); } return random; }, random => {} );
var ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){ ints[inner] = random.Next();}
Solution 4 – Independent runs
var tasks = Enumerable.Range (0, 8).Select ( i => Task.Factory.StartNew ( () => { var ints = new int[InnerLoop]; var random = new Random (); while (counter.CountDown ()) { for (var inner = 0; inner < InnerLoop; ++inner) { ints[inner] = random.Next(); } } }, TaskCreationOptions.LongRunning)) .ToArray ();Task.WaitAll (tasks);
Parallel.For
Only for CPU bound problems
Sharing is bad
Kills performanceRace conditions
Dead-locks
Cache locality
RAM is a misnomerClass designAvoid GC
Natural concurrency
Avoid Parallel.For
Act like an engineer
Measure before and after
One more thing…
http://tinyurl.com/wcom-cpuscalability
Mårten RångeWCOM AB
@marten_range