56
Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System

  • Upload
    long

  • View
    74

  • Download
    0

Embed Size (px)

DESCRIPTION

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System. Ben Gamsa , Orran Krieger, Jonathan Appavoo , Michael Stumm. Locality. What do they mean by l ocality? locality of reference? temporal locality? spatial locality? . Temporal Locality. - PowerPoint PPT Presentation

Citation preview

Page 1: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System

Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm

Page 2: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Locality

• What do they mean by locality?– locality of reference?– temporal locality?– spatial locality?

Page 3: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Temporal Locality

• Recently accessed data and instructions are likely to be accessed in the near future

Page 4: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Spatial Locality

• Data and instructions close to recently accessed data and instructions are likely to be accessed in the near future

Page 5: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Locality of Reference

• If we have good locality of reference, is that a good thing for multiprocessors?

Page 6: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Locality in Multiprocessors

• Good performance depends on data being local to a CPU– Each CPU uses data from its own cache• cache hit rate is high• each CPU has good locality of reference

– Once data is brought into cache it stays there• cache contents not invalidated by other CPUs• different CPUs have different locality of reference

Page 7: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Example: Shared Counter

Memory

CPU

Cache

CPU

Cache

Counter

Page 8: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Example: Shared Counter

Memory

CPU CPU

0

Page 9: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Example: Shared Counter

Memory

CPU

0

CPU

0

Page 10: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Example: Shared Counter

Memory

CPU

1

CPU

1

Page 11: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Example: Shared Counter

Memory

CPU

1

CPU

1

1

Read : OK

Page 12: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Example: Shared Counter

Memory

CPU CPU

2

2

Invalidate

Page 13: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Performance

Page 14: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Problems

• Counter bounces between CPU caches– cache miss rate is high

• Why not give each CPU its own piece of the counter to increment?– take advantage of commutativity of addition– counter updates can be local– reads require all counters

Page 15: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Array-based Counter

Memory

CPU CPU

0 0

Page 16: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Array-based Counter

Memory

CPU

1

CPU

1 0

Page 17: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Array-based Counter

Memory

CPU

1

CPU

1

1 1

Page 18: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Array-based Counter

Memory

CPU

1

CPU

1

1 1

CPU

2

Read Counter

Add All Counters

(1 + 1)

Page 19: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

PerformancePerforms no better than ‘shared counter’!

Page 20: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Problem: False Sharing

• Caches operate at the granularity of cache lines– if two pieces of the counter are in the same cache

line they can not be cached (for writing) on more than one CPU at a time

Page 21: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

False Sharing

Memory

CPU CPU

0,0

Page 22: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

False Sharing

Memory

CPU

0,0

CPU

0,0

Page 23: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

False Sharing

Memory

CPU

0,0

CPU

0,0

0,0

Sharing

Page 24: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

False Sharing

Memory

CPU

1,0

CPU

1,0

Invalidate

Page 25: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

False Sharing

Memory

CPU

1,0

CPU

1,0

1,0

Sharing

Page 26: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

False Sharing

Memory

CPU CPU

1,1

1,1

Invalidate

Page 27: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Solution?

• Spread the counter components out in memory: pad the array

Page 28: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Padded Array

Memory

CPU CPU

00

Page 29: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Padded Array

Memory

CPU

1

CPU

1

11

Updates independent of each other

Page 30: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

PerformanceWorks better

Page 31: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Locality in OS

• Serious performance impact• Difficult to retrofit• Tornado– Ground up design– Object Oriented approach (natural locality)

Page 32: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Tornado

• Object oriented approach• Clustered objects• Protected procedure call• Semi-automatic garbage collection– Simplifies locking protocols

Page 33: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Object Oriented Structure

• Each resource is represented by an object• Requests to virtual resources handled

independently– No shared data structure access– No shared locks

Page 34: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Why Object Oriented?

Process 1

Process 2

Process Table

Page 35: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Why Object Oriented?

Coarse-grain locking:Process 1

Process 2

Process Table

Process 1

Lock

Page 36: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Why Object Oriented?

Coarse-grain locking:Process 1

Process 2

Process Table

Process 1

Lock

Process 2

Page 37: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Object Oriented Approach

Class ProcessTableEntry{datalock

code}

Page 38: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Object Oriented Approach

Fine-grain, instance locking:Process 1

Process 2

Process Table

Process 1

Lock

Process 2

Lock

Page 39: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Clustered Objects

• Problem: how to improve locality for widely shared objects?

• A single logical object can be composed of multiple local representatives– the reps coordinate with each other to manage

the object’s state– they share the object’s reference

Page 40: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Clustered Objects

Page 41: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Clustered Object References

Page 42: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Clustered Objects : Implementation

• A translation table per processor– Located at same virtual address– Pointer to rep

• Clustered object reference is just a pointer into the table– created on demand when first accessed– global miss handling object

Page 43: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Clustered Objects

• Degree of clustering• Management of state– partitioning– distribution– replication (how to maintain consistency?)

• Coordination between reps?– Shared memory– Remote PPCs

Page 44: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Counter: Clustered Object

Counter – Clustered Object

CPU CPU

rep 1 rep 1

Object Reference

Page 45: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Counter: Clustered Object

Counter – Clustered Object

CPU

1

CPU

1

rep 1 rep 1

Object Reference

Page 46: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Counter: Clustered Object

Counter – Clustered Object

CPU

2

CPU

1

rep 2 rep 1

Object Reference

Update independent of each other

Page 47: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Counter: Clustered Object

Counter – Clustered Object

CPU

1

CPU

1

rep 1 rep 1

Object Reference

Page 48: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Counter: Clustered Object

rep 1 rep 1

Object Reference

Counter – Clustered Object

CPU

1

CPU

1

rep 1 rep 1

Read Counter

Page 49: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Counter: Clustered Object

rep 1 rep 1

Object Reference

Counter – Clustered Object

CPU

1

CPU

1

rep 1 rep 1

Add All Counters

(1 + 1)

Page 50: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Synchronization

• Two distinct locking issues– Locking• mutually exclusive access to objects

– Existence guarantees• making sure an object is not freed while still in use

Page 51: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Locking in Tornado

• Encapsulate locking within individual objects• Uses clustered objects to limit contention• Uses spin-then-block locks

Page 52: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Existence Guarantees: the problem

• Use a lock to protect all references to an object?– eliminates races where one thread is accessing the

object and another is deallcoating it– results in complex global hierarchy of locks

• Tornado - semi automatic garbage collection– Clustered object reference can be used any time– Eliminates needs for locks

Page 53: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Existence Guarantees in Tornado

• Semi-automatic garbage collection:– programmer decides what to free, system decided

when to free it– guarantees that object references can be used

safely– eliminates needs for reference locks

Page 54: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

How does it work?

• Programmer removes all persistent references– Normal cleanup done manually

• System tracks all temporary references– Event driven kernel– Maintain an activity counter for each processor – Delete object only when activity counter is zero

Page 55: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Performance Scalability

Page 56: Tornado: Maximizing Locality  and Concurrency in  a Shared Memory Multiprocessor Operating System

Conclusion

• Object-oriented approach and clustered objects exploit locality to improve concurrency

• OO design has some overhead, but it is low compared to the performance advantages

• Tornado scales extremely well and achieves high performance on shared-memory multiprocessors