Concurrency scalability

Preview:

DESCRIPTION

Herb Sutter (GotW.ca) says that the concept of Concurrency is easier understood if split into three sub concepts; scalability, responsiveness and consistency. This presentation is the first of three covering these concepts, starting off with everyone’s favorite: Scalability – i.e. splitting a CPU-bound problem onto several cores in order to solve the problem faster. I will show what tools what .NET offer but also performance pitfalls that arise from an escalating problem that plagued computer architecture for the last 20 years.

Citation preview

Mårten RångeWCOM AB

@marten_range

ConcurrencyExamples for .NET

Responsive

PerformanceScalable algorithms

Three pillars of Concurrency

Scalability (CPU) Parallel.For

Responsiveness Task/Future async/await

Consistency lock/synchronized Interlocked.* Mutex/Event/Semaphore Monitor

Scalability

Which is fastest?

var ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){    ints[inner] = random.Next();}// ------------------------------------------------var ints = new int[InnerLoop];var random = new Random();Parallel.For(    0,     InnerLoop,    i => ints[i] = random.Next()    );

SHARED STATE Race condition

var ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){    ints[inner] = random.Next();}// ------------------------------------------------var ints = new int[InnerLoop];var random = new Random();Parallel.For(    0,     InnerLoop,    i => ints[i] = random.Next()    );

SHARED STATE Poor performancevar ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){    ints[inner] = random.Next();}// ------------------------------------------------var ints = new int[InnerLoop];var random = new Random();Parallel.For(    0,     InnerLoop,    i => ints[i] = random.Next()    );

Then and now

Metric VAX-11/750 (’80)

Today Improvement

MHz 6 3300 550x

Memory MB 2 16384 8192x

Memory MB/s 13 R ~10000W ~2500

770x190x

Then and now

Metric VAX-11/750 (’80)

Today Improvement

MHz 6 3300 550x

Memory MB 2 16384 8192x

Memory MB/s 13 R ~10000W ~2500

770x190x

Memory nsec 225 70 3x

Then and now

Metric VAX-11/750 (’80)

Today Improvement

MHz 6 3300 550x

Memory MB 2 16384 8192x

Memory MB/s 13 R ~10000W ~2500

770x190x

Memory nsec 225 70 3x

Memory cycles

1.4 210 -150x

299,792,458 m/s

Speed of light is too slow

0.09 m/c

99% - latency mitigation

1% - computation

2 Core CPU

RAM

L3L2

L1

CPU1

L2

L1

CPU2

2 Core CPU – L1 Cache

L1

CPU1

L1

CPU2

new Random ()

new int[InnerLoop]

2 Core CPU – L1 Cache

L1

CPU1

L1

CPU2

Random object Random object

2 Core CPU – L1 Cache

L1

CPU1

L1

CPU2

Random object Random object

2 Core CPU – L1 Cache

L1

CPU1

L1

CPU2

Random objectRandom object

2 Core CPU – L1 Cache

L1

CPU1

L1

CPU2

Random objectRandom object

2 Core CPU – L1 Cache

L1

CPU1

L1

CPU2

Random objectRandom object

2 Core CPU – L1 Cache

L1

CPU1

L1

CPU2

Random objectRandom object

4 Core CPU – L1 Cache

L1

CPU1

L1

CPU2

L1

CPU3

L1

CPU4

new Random ()

new int[InnerLoop]

2x4 Core CPU

RAM

L3L2

L1

CPU1

L2

L1

CPU2

L2

L1

CPU3

L2

L1

CPU4

L3L2

L1

CPU5

L2

L1

CPU6

L2

L1

CPU7

L2

L1

CPU8

Solution 1 – Locks

var ints = new int[InnerLoop];var random = new Random();Parallel.For(    0,     InnerLoop,    i => {lock (ints) {ints[i] = random.Next();}}    );

Solution 2 – No sharing

var ints = new int[InnerLoop];Parallel.For( 0, InnerLoop, () => new Random(), (i, pls, random) => {ints[i] = random.Next(); return random;}, random => {} );

Parallel.For adds overheadLevel0

Level1

Level2

ints[0]

ints[1]

Level2

ints[2]

ints[3]

Level1

Level2

ints[4]

ints[5]

Level2

ints[6]

ints[7]

Solution 3 – Less overhead

var ints = new int[InnerLoop];Parallel.For( 0, InnerLoop / Modulus, () => new Random(), (i, pls, random) => { var begin = i * Modulus ; var end = begin + Modulus ; for (var iter = begin; iter < end; ++iter) { ints[iter] = random.Next(); } return random; }, random => {} );

var ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){    ints[inner] = random.Next();}

Solution 4 – Independent runs

var tasks = Enumerable.Range (0, 8).Select ( i => Task.Factory.StartNew ( () => { var ints = new int[InnerLoop]; var random = new Random (); while (counter.CountDown ()) { for (var inner = 0; inner < InnerLoop; ++inner) { ints[inner] = random.Next(); } } }, TaskCreationOptions.LongRunning)) .ToArray ();Task.WaitAll (tasks);

Parallel.For

Only for CPU bound problems

Sharing is bad

Kills performanceRace conditions

Dead-locks

Cache locality

RAM is a misnomerClass designAvoid GC

Natural concurrency

Avoid Parallel.For

Act like an engineer

Measure before and after

One more thing…

Mårten RångeWCOM AB

@marten_range

Recommended