57
1 Thread-Level Speculation Steffan Carnegie Mellon Thread-Level Speculation: Thread-Level Speculation: Towards Ubiquitous Parallelism Towards Ubiquitous Parallelism Greg Steffan Greg Steffan School of Computer Science School of Computer Science Carnegie Mellon University Carnegie Mellon University

Thread-Level Speculation: Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

  • Upload
    johnna

  • View
    34

  • Download
    1

Embed Size (px)

DESCRIPTION

Thread-Level Speculation: Towards Ubiquitous Parallelism Greg Steffan School of Computer Science Carnegie Mellon University. Moore’s Law: the Original Version. Log transistors on a chip. Time.  exponentially increasing resources. Moore’s Law: the Popular Interpretation. - PowerPoint PPT Presentation

Citation preview

Page 1: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

1Thread-Level Speculation SteffanCarnegie Mellon

Thread-Level Speculation: Thread-Level Speculation:

Towards Ubiquitous ParallelismTowards Ubiquitous Parallelism

Greg SteffanGreg Steffan

School of Computer ScienceSchool of Computer Science

Carnegie Mellon UniversityCarnegie Mellon University

Page 2: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

2Thread-Level Speculation SteffanCarnegie Mellon

Moore’s Law: the Moore’s Law: the Original VersionOriginal Version

Log

trans

istor

s on

a ch

ip

Time

exponentially increasing resources

Page 3: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

3Thread-Level Speculation SteffanCarnegie Mellon

Moore’s Law: the Popular InterpretationMoore’s Law: the Popular Interpretation

Log

perfo

rman

ce

Time

increase resources increase performance?

Page 4: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

4Thread-Level Speculation SteffanCarnegie Mellon

A Superposition of InnovationsA Superposition of Innovations

Datapath Size(8b, 16b, 32b, 64b)Lo

g of

Per

form

ance

Time

ILP is running out of steam

Instruction-LevelParallelism (ILP)

Page 5: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

5Thread-Level Speculation SteffanCarnegie Mellon

Why ILP is Running Out of SteamWhy ILP is Running Out of Steam

Cross-chip wire latency (in cycles):Cross-chip wire latency (in cycles):

Development cost:Development cost:

Power density:Power density:

Probability of a defect:Probability of a defect:

these problems must be addressed

Page 6: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

6Thread-Level Speculation SteffanCarnegie Mellon

How Do We Sustain the Performance Curve?How Do We Sustain the Performance Curve?

Datapath Size(8b, 16b, 32b, 64b)Lo

g of

Per

form

ance

Time

what is the next big win for micro-architecture?

Instruction-LevelParallelism (ILP)

?we are here

now

Page 7: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

7Thread-Level Speculation SteffanCarnegie Mellon

A New Path: Thread-Level ParallelismA New Path: Thread-Level Parallelism

Tolerate cross-chip wire latency:Tolerate cross-chip wire latency:– localized wireslocalized wires

Lower development cost:Lower development cost:– stamp out processor coresstamp out processor cores

Lower power:Lower power:– turn off idle processorsturn off idle processors

Tolerate defects:Tolerate defects:– disable any faulty processordisable any faulty processor

many advantages

C

C

P

C

P

Chip Multiprocessor (CMP)

Processors

Caches

Page 8: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

8Thread-Level Speculation SteffanCarnegie Mellon

Multithreading in Every Scale of MachineMultithreading in Every Scale of Machine

Supercomputers

Threads

DesktopsChip Multiprocessor (CMP)

Cache

Proc Proc

(IBM Power4, SUN MAJC, Sibyte SB-1250)

multithreading on a chip!

Simultaneous-Multithreading(ALPHA 21464,

Intel Xeon)

Cache

Proc

Page 9: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

9Thread-Level Speculation SteffanCarnegie Mellon

Improving Performance with a Chip MultiprocessorImproving Performance with a Chip Multiprocessor

C

C

P

C

P

C

P

C

P

C

C

P

Multiprogramming Workload:

ExecutionTime

improves throughput

Processor

Caches

Applications

Page 10: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

10Thread-Level Speculation SteffanCarnegie Mellon

Improving Performance with a Chip MultiprocessorImproving Performance with a Chip Multiprocessor

C

C

P

C

P

C

P

C

P

C

C

P

Single Application:

need parallel threads to reduce execution time

C

C

P

C

P

C

P

C

P

Exec.Time

Page 11: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

11Thread-Level Speculation SteffanCarnegie Mellon

How Do We Parallelize Everything?How Do We Parallelize Everything?

1) Programmers write parallel code from now on1) Programmers write parallel code from now on– time-consuming and frustratingtime-consuming and frustrating

– very hard to get rightvery hard to get right

– not a broad solutionnot a broad solution

2) System parallelizes automatically2) System parallelizes automatically– no burden on the programmerno burden on the programmer

– parallelize any applicationparallelize any application

automatic parallelization is preferred

Page 12: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

12Thread-Level Speculation SteffanCarnegie Mellon

Current Technique: Prove IndependenceCurrent Technique: Prove Independence

IndependentIndependent

DependentDependent

for (i = 0;i < N;i++) A[i] = 0;

for (i = 1;i < N;i++) A[i] = A[i-1];

A[0]0A[1]0

A[2]0

A[1]A[0]A[2]A[1]

A[3]A[2]

need to fully understand data access pattern

Page 13: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

13Thread-Level Speculation SteffanCarnegie Mellon

Ubiquitous Parallelization: How Close Are We?Ubiquitous Parallelization: How Close Are We?

Compiler can parallelize portions of numeric programsCompiler can parallelize portions of numeric programs– scientific, floating-point, array-based codesscientific, floating-point, array-based codes

– usually written in fortranusually written in fortran

What about everything else?What about everything else?– general-purpose, integer codesgeneral-purpose, integer codes

– written in C, C++, Java, etc.written in C, C++, Java, etc.

– little (if any) success so farlittle (if any) success so far

parallelize by proving independence

proving independence is infeasible

Page 14: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

14Thread-Level Speculation SteffanCarnegie Mellon

The Main Culprit: IndirectionThe Main Culprit: Indirection

for (i = 0;i < N;i++) A[i] = A[B[i]];

while (...){... = *q;*p = ...;

}

need to know the values of B[]

need to know the targets of p and q

PointersPointers

Indirect array referencesIndirect array references A[0]A[B[0]]A[1]A[B[1]]

A[2]A[B[2]]

?

?

… *q*p …

… *q*p …

?

Page 15: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

15Thread-Level Speculation SteffanCarnegie Mellon

SummarySummary

We need the next big performance winWe need the next big performance win– instruction-level parallelism will run out of gasinstruction-level parallelism will run out of gas

Multithreading will soon be everywhereMultithreading will soon be everywhere– we need automatically-parallelized programswe need automatically-parallelized programs

The scope of current techniques is extremely limitedThe scope of current techniques is extremely limited– proving independence is infeasibleproving independence is infeasible

A solution: Thread-Level Speculation (TLS)

Page 16: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

16Thread-Level Speculation SteffanCarnegie Mellon

Thread-Level Speculation: the Basic IdeaThread-Level Speculation: the Basic Idea

exploit available thread-level parallelism

Exec.Time TLS

…*q*p…

Recover

…*q

violation

Page 17: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

17Thread-Level Speculation SteffanCarnegie Mellon

OutlineOutline

The Software/Hardware Sweet SpotThe Software/Hardware Sweet Spot

• Compiler SupportCompiler Support

• Industry-Friendly HardwareIndustry-Friendly Hardware

• Improving Value CommunicationImproving Value Communication

• ConclusionsConclusions

Page 18: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

18Thread-Level Speculation SteffanCarnegie Mellon

Support for TLS: What Do We Need?Support for TLS: What Do We Need?

Break programs into speculative threadsBreak programs into speculative threads– to maximize thread-level parallelismto maximize thread-level parallelism

Track data dependencesTrack data dependences– to determine whether speculation was safeto determine whether speculation was safe

Recover from failed speculationRecover from failed speculation– to ensure correct executionto ensure correct execution

three key elements of every TLS system

Page 19: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

19Thread-Level Speculation SteffanCarnegie Mellon

Compiler Researche

rsdo it

in Software

Page 20: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

20Thread-Level Speculation SteffanCarnegie Mellon

LRPD Test (Illinois at UC)LRPD Test (Illinois at UC)

++ implemented entirely in software implemented entirely in software

–– applies only to array-based codeapplies only to array-based code

–– no partial parallelismno partial parallelism

softwaredependencetracking

was parallelexecution safe?

Exec.Time

Page 21: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

21Thread-Level Speculation SteffanCarnegie Mellon

Architects do it

in Hardware

Page 22: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

22Thread-Level Speculation SteffanCarnegie Mellon

Multiscalar (Wisconsin)Multiscalar (Wisconsin)

• compiler breaks program into threadscompiler breaks program into threads

• Address Resolution BufferAddress Resolution Buffer (ARB) (ARB)

+ + –– highly specialized for speculation highly specialized for speculation

ARBP

PP P

P

P

P P

Page 23: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

23Thread-Level Speculation SteffanCarnegie Mellon

Our Approach: Find the Sweet SpotOur Approach: Find the Sweet Spot

Compiler:Compiler:++ global view of control flow global view of control flow

–– hard/impossible to understand data dependenceshard/impossible to understand data dependences

Hardware:Hardware:–– operates on a small window of instructions operates on a small window of instructions

++ observes dynamic memory accesses observes dynamic memory accesses

leverage their respective strengths

Page 24: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

24Thread-Level Speculation SteffanCarnegie Mellon

The Sweet SpotThe Sweet Spot

• Compiler: Compiler: – break programs into speculative threadsbreak programs into speculative threads

• why: compiler has a global view of control flowwhy: compiler has a global view of control flow

• Hardware:Hardware:– track data dependencestrack data dependences

• why: software comparison of all addresses infeasiblewhy: software comparison of all addresses infeasible

– recover from failed speculationrecover from failed speculation• why: software buffering of all writes infeasiblewhy: software buffering of all writes infeasible

important: minimize additional hardware

Page 25: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

25Thread-Level Speculation SteffanCarnegie Mellon

OutlineOutline

The Software/Hardware Sweet SpotThe Software/Hardware Sweet Spot

Compiler SupportCompiler Support

• Industry-Friendly HardwareIndustry-Friendly Hardware

• Improving Value CommunicationImproving Value Communication

• ConclusionsConclusions

Page 26: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

26Thread-Level Speculation SteffanCarnegie Mellon

MIPSExecutable

Compiler Support for TLSCompiler Support for TLS

RegionSelection

Transformation and

Optimization

SequentialSourceCode

insertsTLS instructions

profileinformation which loops?

Page 27: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

27Thread-Level Speculation SteffanCarnegie Mellon

Simple Performance ModelSimple Performance Model

P P P P

DependenceTracking

• 4 processors• Each processor issues one instruction per cycle • No communication latency between processors

shows potential performance benefit

Page 28: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

28Thread-Level Speculation SteffanCarnegie Mellon

Potential ImprovementPotential Improvement

significant impact on execution time

Page 29: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

29Thread-Level Speculation SteffanCarnegie Mellon

OutlineOutline

The Software/Hardware Sweet SpotThe Software/Hardware Sweet Spot

Compiler SupportCompiler Support

Industry-Friendly HardwareIndustry-Friendly Hardware

• Improving Value CommunicationImproving Value Communication

• ConclusionsConclusions

Page 30: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

30Thread-Level Speculation SteffanCarnegie Mellon

GoalsGoals

1) Handle arbitrary memory accesses1) Handle arbitrary memory accesses– i.e. not just array referencesi.e. not just array references

2) Preserve single-thread performance2) Preserve single-thread performance– keep hardware support minimal and simplekeep hardware support minimal and simple

3) Apply to any scale of multithreaded architecture3) Apply to any scale of multithreaded architecture– within a chip and beyondwithin a chip and beyond

effective, simple, scalable

Page 31: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

31Thread-Level Speculation SteffanCarnegie Mellon

RequirementsRequirements

1) Recover from failed speculation1) Recover from failed speculation• buffer speculative writes from memory buffer speculative writes from memory

2) Track data dependences 2) Track data dependences • detect data dependence violationsdetect data dependence violations

each has several implementation options

Page 32: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

32Thread-Level Speculation SteffanCarnegie Mellon

Recover From Failed Speculation: Option 1Recover From Failed Speculation: Option 1

Augment the store buffer:Augment the store buffer:+ + common device in superscalar processorscommon device in superscalar processors

• facilitates non-blocking storesfacilitates non-blocking stores

–– too smalltoo small

Procstore buffer

Page 33: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

33Thread-Level Speculation SteffanCarnegie Mellon

Add a new dedicated bufferAdd a new dedicated buffer+ + can design an efficient speculation mechanismcan design an efficient speculation mechanism

–– want to avoid large speculation-specific structureswant to avoid large speculation-specific structures

Proc

Recover From Failed Speculation: Option 2Recover From Failed Speculation: Option 2

Page 34: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

34Thread-Level Speculation SteffanCarnegie Mellon

Augment the cacheAugment the cache+ + very common structurevery common structure

+ + relatively largerelatively large

Cache

Proc

just maintain single-thread performance

Recover From Failed Speculation: Option 3Recover From Failed Speculation: Option 3

Page 35: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

35Thread-Level Speculation SteffanCarnegie Mellon

Tracking Data Dependences: Option 1Tracking Data Dependences: Option 1

Add a dedicated “3Add a dedicated “3rdrd-party” entity-party” entity–– want to avoid large speculation-specific structureswant to avoid large speculation-specific structures

–– does not scaledoes not scale

C

P

C

P

DependenceTracker

Load XStore X

violationdetected

Page 36: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

36Thread-Level Speculation SteffanCarnegie Mellon

Tracking Data Dependences: Option 2Tracking Data Dependences: Option 2

Detection at the producerDetection at the producer• producer informed of all addresses consumedproducer informed of all addresses consumed

–– awkward: producer must notify consumer of any violationawkward: producer must notify consumer of any violation

C

P

C

P

Load X Store X

load address

violationdetected

Producer Consumer

Page 37: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

37Thread-Level Speculation SteffanCarnegie Mellon

Tracking Data Dependences: Option 3Tracking Data Dependences: Option 3

Detection at the consumer Detection at the consumer • consumers informed of all addresses producedconsumers informed of all addresses produced

C

P

C

P

Load X Store X

store address violation

detected

similar to invalidation-based cache coherence!

Producer Consumer

Page 38: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

38Thread-Level Speculation SteffanCarnegie Mellon

Augmenting the CacheAugmenting the Cache

CacheTagState Data

-- --- -

-- --- -

P

Page 39: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

39Thread-Level Speculation SteffanCarnegie Mellon

Augmenting the CacheAugmenting the Cache

CacheState Data

- -- -

- -

Tag--

--- -

SL--

--

SM--

--

SpeculativelyModified

SpeculativelyLoaded

modest amount of extra space

P

Page 40: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

40Thread-Level Speculation SteffanCarnegie Mellon

valid

Augmenting the CacheAugmenting the Cache

CacheState Datavalid #valid #

valid #

TagXV

YZ #

SL00

00

SM11

01

P

when speculation fails…

Page 41: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

41Thread-Level Speculation SteffanCarnegie Mellon

invalid

Augmenting the CacheAugmenting the Cache

CacheState Datainvalid -invalid -

valid #

Tag--

Y- -

SL0

0

00

SM00

00

P

…can quickly discard speculative state

Page 42: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

42Thread-Level Speculation SteffanCarnegie Mellon

Extending Cache CoherenceExtending Cache Coherence

C

P

C

P

Load X Store X

invalidate X; from 4 violation

detected (4<5)

4 5

X is speculativelyloaded

straightforward extension of cache coherence

Page 43: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

43Thread-Level Speculation SteffanCarnegie Mellon

Detailed Performance ModelDetailed Performance Model

Underlying architectureUnderlying architecture– single-chip multiprocessorsingle-chip multiprocessor

– implements speculative coherenceimplements speculative coherence

SimulatorSimulator– superscalar, a modernized superscalar, a modernized MIPS R10KMIPS R10K– models all bandwidth and contentionmodels all bandwidth and contention

detailed simulation!

C

C

P

C

P

Crossbar

Page 44: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

44Thread-Level Speculation SteffanCarnegie Mellon

Will it Work at All of These Scales?Will it Work at All of These Scales?

Supercomputers

Threads

Desktops

yes: coherence scales up and down

Chip Multiprocessor (CMP)

Cache

Proc Proc

Simultaneous-Multithreading

Cache

Proc

Page 45: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

45Thread-Level Speculation SteffanCarnegie Mellon

Performance on Multi-Chip SystemsPerformance on Multi-Chip Systems

our scheme is scalable

Page 46: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

46Thread-Level Speculation SteffanCarnegie Mellon

Performance on General-Purpose ApplicationsPerformance on General-Purpose Applications

significant performance improvements

Page 47: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

47Thread-Level Speculation SteffanCarnegie Mellon

OutlineOutline

The Software/Hardware Sweet SpotThe Software/Hardware Sweet Spot

Compiler SupportCompiler Support

Industry-Friendly HardwareIndustry-Friendly Hardware

Improving Value CommunicationImproving Value Communication

• ConclusionsConclusions

Page 48: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

48Thread-Level Speculation SteffanCarnegie Mellon

SpeculateSpeculate

good when p != q

Store *p

Load *q

Memory

Page 49: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

49Thread-Level Speculation SteffanCarnegie Mellon

Synchronize (and forward)Synchronize (and forward)

good when p == q

Store *p

Load *q

Memory

SignalWait(stall)

Store *pLoad *q

Memory

(Speculate)

Page 50: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

50Thread-Level Speculation SteffanCarnegie Mellon

Reduce the Critical Forwarding PathReduce the Critical Forwarding Path

Wait

Load X

Store X

Signal

Overview Big Critical Path Small Critical Path

decreases execution time

criticalpath

stall execution time

execution time

Page 51: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

51Thread-Level Speculation SteffanCarnegie Mellon

PredictPredict

good when p == q and *q is predictable

Store *p

Load *q

Memory

ValuePredictor

Store *p

Load *q

Memory

SignalWait(stall)

(Synchronize)

Store *pLoad *q

Memory

(Speculate)

Page 52: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

52Thread-Level Speculation SteffanCarnegie Mellon

Improving on Compile-Time DecisionsImproving on Compile-Time Decisions

Predict

Speculate

Synchronize

Compiler

Speculate

Synchronize

Hardware

reduce criticalforwarding path

reduce criticalforwarding path

improve the efficiency of value communication

Page 53: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

53Thread-Level Speculation SteffanCarnegie Mellon

TechniquesTechniques

Prediction Prediction – memory value predictionmemory value prediction

– forwarded value predictionforwarded value prediction

– silent storessilent stores

SynchronizationSynchronization– dynamic synchronizationdynamic synchronization

– compiler scheduling to reduce the critical pathcompiler scheduling to reduce the critical path

– hardware prioritization to reduce the critical path hardware prioritization to reduce the critical path $$$$$$

inexpensive, except for hardware prioritization

Page 54: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

54Thread-Level Speculation SteffanCarnegie Mellon

Execution Time BreakdownExecution Time Breakdown

Page 55: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

55Thread-Level Speculation SteffanCarnegie Mellon

Performance on 4 ProcessorsPerformance on 4 Processors

S=Sequential, B=Baseline

lots of failed speculation and synchronization

Page 56: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

56Thread-Level Speculation SteffanCarnegie Mellon

Performance on 4 ProcessorsPerformance on 4 Processors

S=Sequential, B=Baseline, O=Optimizations

significant improvement

Page 57: Thread-Level Speculation:  Towards Ubiquitous Parallelism Greg Steffan School of Computer Science

57Thread-Level Speculation SteffanCarnegie Mellon

ConclusionsConclusions

• TLS may be the next big winTLS may be the next big win

• Industry-friendly hardware is possibleIndustry-friendly hardware is possible

• Efficient value communication is keyEfficient value communication is key

Ongoing/future work:Ongoing/future work:– compiler: improving region selection and coveragecompiler: improving region selection and coverage

– hardware: improve cache localityhardware: improve cache locality