Microarchitecture and Design Challenges for Gigascale Integration · 2004-12-15 · Alternate, 3G...

Preview:

Citation preview

®® 1

Microarchitecture and Microarchitecture and Design Challenges Design Challenges

for for Gigascale IntegrationGigascale Integration

Shekhar BorkarShekhar BorkarIntel Corp.Intel Corp.

December 6, 2004December 6, 2004

2

OutlineOutline

Process technology scaling & near Process technology scaling & near term challengesterm challengesµµArchitectureArchitecture & Design solutions& Design solutionsUpcoming paradigm shiftsUpcoming paradigm shiftsLong term outlook & challengesLong term outlook & challengesSummarySummary

3

Goal: 1TIPS by 2010Goal: 1TIPS by 2010

0.01

0.1

1

10

100

1000

10000

100000

1000000

1970 1980 1990 2000 2010

MIP

S

Pentium® Pro Architecture

Pentium® 4 Architecture

Pentium® Architecture

486386

2868086

How do you get there?How do you get there?How do you get there?

4

Technology ScalingTechnology ScalingGATE

SOURCE

BODY

DRAIN

Xj

ToxD

GATE

SOURCE DRAIN

LeffBODY

Lower active powerLower active powerVdd & Vt scalingVdd & Vt scaling

Faster transistor, Faster transistor, higher performancehigher performance

Oxide thickness Oxide thickness scales downscales down

Doubles transistor Doubles transistor densitydensity

Dimensions scale Dimensions scale down by 30%down by 30%

Scaling will continue, but with challenges!Scaling will continue, but with challenges!Scaling will continue, but with challenges!

5

Technology OutlookTechnology Outlook

Medium High Very HighMedium High Very HighVariabilityVariability

Energy scaling will slow downEnergy scaling will slow down>0.5>0.5>0.5>0.5>0.35>0.35Energy/Logic Op Energy/Logic Op scalingscaling

0.5 to 1 layer per generation0.5 to 1 layer per generation88--9977--8866--77Metal LayersMetal Layers

1111111111111111RC DelayRC Delay

Reduce slowly towards 2Reduce slowly towards 2--2.52.5<3<3~3~3ILD (K)ILD (K)

Low Probability High ProbabilitLow Probability High ProbabilityyAlternate, 3G etcAlternate, 3G etc

128

1111

20162016

High Probability Low ProbabilitHigh Probability Low ProbabilityyBulk Planar CMOSBulk Planar CMOS

Delay scaling will slow downDelay scaling will slow down>0.7>0.7~0.7~0.70.70.7Delay = CV/I Delay = CV/I scalingscaling

256643216842Integration Integration Capacity (BT)Capacity (BT)

88161622223232454565659090Technology Node Technology Node (nm)(nm)

20182018201420142012201220102010200820082006200620042004High Volume High Volume ManufacturingManufacturing

6

7

The The Leakage(sLeakage(s))……

1

10

100

1000

10000

30 50 70 90 110 130

Temp (C)

Ioff

(na/

u)

0.25

90

45

0.1

1

10

100

1000

0.25u 0.18u 0.13u 90nm 65nm 45nm

Technology

SD L

eaka

ge (W

atts

)

2X Tr Growth1.5X Tr Growth

90nm MOS Transistor

50nm50nm SiSi

1.2 nm SiO1.2 nm SiO22

GateGate

1.E-031.E-021.E-011.E+001.E+011.E+021.E+031.E+041.E+051.E+06

0.25u 0.18u 0.13u 90nm 65nm 45nm

Technology

Gat

e Le

akag

e (W

atts

)

1.5X2X

During Burn-in1.4X Vdd

8

Projected Power Projected Power (unconstrained)(unconstrained)

0

200

400

600

800

1000

1200

1400

90nm 65nm 45nm 32nm 22nm 16nm

Pow

er (W

), Po

wer

Den

sity

(W/c

m2 )SiO2 LkgSD LkgActive

10 mm Die

Active and Leakage power will become prohibitiveActive and Leakage power will become prohibitiveActive and Leakage power will become prohibitive

9

Product Cost PressureProduct Cost Pressure

Shrinking ASP, and shrinking $ budget for powerShrinking ASP, and shrinking $ budget for powerShrinking ASP, and shrinking $ budget for power

0

200

400

600

800

1000

1200

1400

1999 2000 2001 2002 2003 2004 2005

Des

k To

p A

SP ($

)

Source: IDC

Other

Thermal

$10 Power

Delivery

$25

Platform Budget

10

Must Fit in Power EnvelopeMust Fit in Power Envelope

0

200

400

600

800

1000

1200

1400

90nm 65nm 45nm 32nm 22nm 16nm

Pow

er (W

), Po

wer

Den

sity

(W/c

m2 )SiO2 LkgSD LkgActive

10 mm Die

Technology, Circuits, and Technology, Circuits, and Architecture to constrain the power Architecture to constrain the power

11

Between Now and ThenBetween Now and Then——

Move away from Frequency alone to Move away from Frequency alone to deliver performancedeliver performanceMore onMore on--die memorydie memoryMultiMulti--everywhereeverywhere––MultiMulti--threadingthreading––Chip level multiChip level multi--processingprocessingThroughput oriented designsThroughput oriented designsValued performance by higher level of Valued performance by higher level of integrationintegration––Monolithic & Monolithic & PolylithicPolylithic

12

Leakage SolutionsLeakage Solutions

Tri-gate Transistor

Silicon substrateSilicon substrate

1.2 nm SiO1.2 nm SiO22

GateGate

Planar Transistor

SiGeSiGe

Silicon substrate

Gate electrode

3.0nm High-k

For a few generations, then what?

13

Active Power ReductionActive Power Reduction

Slow Fast Slow

Low

Sup

ply

Volta

ge

Hig

h Su

pply

Vo

ltage

Multiple Supply Voltages

Logic BlockFreq = 1Vdd = 1Throughput = 1Power = 1Area = 1 Pwr Den = 1

Vdd

Logic Block

Freq = 0.5Vdd = 0.5Throughput = 1Power = 0.25Area = 2Pwr Den = 0.125

Vdd/2

Logic Block

Throughput Oriented Designs

14

Leakage ControlLeakage ControlBody Bias

VddVbp

Vbn-Ve

+Ve

22--10X10XReductionReduction

Sleep Transistor

Logic Block

22--1000X1000XReductionReduction

Stack Effect

Equal Loading

55--10X10XReductionReduction

15

Optimum FrequencyOptimum Frequency

Maximum performance withMaximum performance with•• Optimum pipeline depth Optimum pipeline depth •• Optimum frequencyOptimum frequency

Process Technology

02468

10

1 2 3 4 5 6 7 8 9 10Relative Frequency

Sub-threshold Leakage increases

exponentially

Pipeline Depth

02468

10

1 2 3 4 5 6 7 8 9 10Relative Pipeline Depth

Power Efficiency

Optimum

Pipeline & Performance

02468

10

1 2 3 4 5 6 7 8 9 10Relative Frequency (Pipelining)

Performance

DiminishingReturn

16

µµArchitectureArchitecture TechniquesTechniques

0%

25%

50%

75%

100%

1u 0.5u 0.25u 0.13u 65nm

Cac

he %

of T

otal

Are

a

486 Pentium®

Pentium® III

Pentium® 4

Pentium® M

Increase on-die Memory

ST Wait for Mem

MT1 Wait for MemMT2 Wait

MT3

Single ThreadSingle Thread

MultiMulti--ThreadingThreading

Full HW Utilization

Multi-threading

Improved performance, no impact Improved performance, no impact on thermals & power deliveryon thermals & power delivery

C1 C2

C3 C4

Cache

Chip Multi-processing

LargeCore

1

1.5

2

2.5

3

3.5

1 2 3 4Die Area, Power

Rel

ativ

e Pe

rfor

man

ce

Multi Core

Single Core

17

Special Purpose HardwareSpecial Purpose Hardware

TCB Exec

Core

PLL

OOO

ROM

CA

M1

TCB Exec

Core

PLL

ROB

ROM

CLB

Inputseq

Sendbuffer

2.23 mm X 3.54 mm, 260K transistors

Opportunities: Network processing enginesMPEG Encode/Decode engines, Speech engines

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

1995 2000 2005 2010 2015

MIP

S GP MIPS@75W

TOE MIPS@~2W

TCP/IP Offload EngineTCP/IP Offload Engine

Si Monolithic Polylithic

CPU

SpecialHW

Memory

CMOS RF

Wireline

RF

Opto-Electronics

Dense Memory

HeterogeneousSi, SiGe, GaAs

Special purpose HW and extreme integration

18

19

Sources of VariationsSources of Variations

0

50

100

150

200

250H

eat F

lux

(W/c

m2)

Heat Flux (W/cm2)Results in Vcc variation

40

50

60

70

80

90

100

110

Tem

pera

ture

(C)

Temperature Variation (°C)Hot spots

10

100

1000

10000

1000 500 250 130 65 32

Technology Node (nm)

Mea

n N

umbe

r of D

opan

t Ato

ms

Random Dopant Fluctuations

0.01

0.1

1

1980 1990 2000 2010 2020

micron

10

100

1000

nm193nm193nm248nm248nm

365nm365nm LithographyLithographyWavelengthWavelength

65nm65nm90nm90nm

130nm130nm

GenerationGeneration

GapGap

45nm45nm32nm32nm 13nm 13nm

EUVEUV

180nm180nm

Source: Mark Bohr, IntelSub-wavelength Lithography

20

Impact of Static VariationsImpact of Static Variations

130nm

30%

5X

FrequencyFrequency~30%~30%

LeakageLeakagePowerPower~5~5--10X10X

0.90.9

1.01.0

1.11.1

1.21.2

1.31.3

1.41.4

11 22 33 44 55Normalized Leakage (Normalized Leakage (IsbIsb))

Nor

mal

ized

Fre

quen

cyN

orm

aliz

ed F

requ

ency

Today…

21

Resistor Network

4.5 mm

5.3

mm

Multiplesubsites PD & Counter Resistor

Network

CUT Bias Amplifier

Delay

Die frequency: Min(F1..F21)Die power: Sum(P1..P21)

Technology 150nm CMOSNumber of subsites per die 21

Body bias range 0.5V FBB to 0.5V RBB

Bias resolution 32 mV

1.6 X 0.24 mm, 21 sites per die150nm CMOS

Adaptive Body BiasAdaptive Body Bias----ExperimentExperiment

22

Adaptive Body BiasAdaptive Body Bias----ResultsResults

0%

20%

60%

100%

Acc

epte

d di

e

noBB

100% yield

ABB

Higher Frequency

Num

ber o

f die

s

Frequency

too slow

ftarget

too leaky

ftarget

ABB

FBB RBB

Num

ber o

f die

s

Frequency

too slow

ftarget

too leaky

ftarget

ABB

FBB RBB

97% highest bin

within die ABB

For given Freq and Power densityFor given Freq and Power density•• 100% yield with ABB 100% yield with ABB •• 97% highest freq bin with ABB for 97% highest freq bin with ABB for within die variability within die variability

23

Circuit Design TradeoffsCircuit Design Tradeoffs

00.5

11.5

2

Low-Vt usagelow high

Higher probability of target frequency with:Higher probability of target frequency with:1.1. Larger transistor sizes Larger transistor sizes 2.2. Higher LowHigher Low--VtVt usageusage

But with power penaltyBut with power penalty

00.5

11.5

2

Transistor sizesmall large

powertarget frequency probability

24

Impact of Critical PathsImpact of Critical Paths

1.1

1.2

1.3

1.4

1 9 17 25# of critical paths

Mea

n cl

ock

freq

uenc

y

Clock frequency

Num

ber o

f die

s

0%

20%

40%

60%

0.9 1.1 1.3 1.5

# critical paths

With increasing # of critical pathsWith increasing # of critical paths––Both Both σσ and and µµ become smallerbecome smaller––Lower mean frequencyLower mean frequency

25

Impact of Logic DepthImpact of Logic Depth

0%

20%

40%

-16% -8% 0% 8% 16%

Delay

20%

40% NMOSPMOS

Device IONN

umbe

r of s

ampl

es (%

)

Variation (%)

NMOS Ionσ/µ

PMOS Ionσ/µ

Delayσ/µ

5.6% 3.0% 4.2%

Logic depth: 16

0.0

0.5

1.0

Logic depth

Rat

io o

fde

lay-σ

to Io

n-σ

16 49

26

0

0.5

1

1.5

Logic depthsmalllarge

frequencytarget frequency probability # uArch critical paths

0

0.5

1

1.5

less more

µµArchitectureArchitecture TradeoffsTradeoffs

Higher target frequency with:Higher target frequency with:1.1. Shallow logic depthShallow logic depth2.2. Larger number of critical pathsLarger number of critical paths

But with lower probabilityBut with lower probability

27

VariationVariation--tolerant Designtolerant Design

0

0.5

1

1.5

# uArch critical pathsless more

Balance power &

frequency with

variation tolerance

0

0.5

1

1.5

Logic depthsmalllarge

frequencytarget frequency probability

00.5

11.5

2

Transistor sizesmall large

powertarget frequency probability

00.5

11.5

2

Low-Vt usagelow high

28

Leakage PowerLeakage Power

Freq

uenc

yFr

eque

ncy DeterministicDeterministic

ProbabilisticProbabilistic10X variation 10X variation

~50% total power~50% total power

Probabilistic DesignProbabilistic Design

DelayDelayPath DelayPath Delay Pr

obab

ility

Prob

abili

tyDeterministic design techniques inadequate in the futureDeterministic design techniques inadequate in the futureDeterministic design techniques inadequate in the future

Due to Due to variations in:variations in:VVdddd, V, Vtt, and , and

TempTemp

Delay TargetDelay Target

# of

Pat

hs#

of P

aths DeterministicDeterministic

Delay TargetDelay Target

# of

Pat

hs#

of P

aths ProbabilisticProbabilistic

29

Shift in Design ParadigmShift in Design ParadigmMultiMulti--variable design optimization for:variable design optimization for:–– Yield and bin splits Yield and bin splits –– Parameter variations Parameter variations –– Active and leakage powerActive and leakage power–– Performance Performance

Tomorrow:Tomorrow:Global OptimizationGlobal Optimization

MultiMulti--variatevariate

Today:Today:Local OptimizationLocal Optimization

Single VariableSingle Variable

30

31

TodayToday’’s Freelance Layouts Freelance Layout

Vss

Vdd

OpIp

Vss

Vdd

Op

No layout restrictionsNo layout restrictionsNo layout restrictions

32

Transistor Orientation RestrictionsTransistor Orientation Restrictions

Vss

Vdd

OpIp

Vss

Vdd

Op

Transistor orientation restricted to improve manufacturing control

Transistor orientation restricted to improve Transistor orientation restricted to improve manufacturing controlmanufacturing control

33

Op

Vss

Vdd

Ip

Vss

Vdd

Op

Transistor Width QuantizationTransistor Width Quantization

34

TodayToday’’s Unrestricted s Unrestricted RoutingRouting

35

Future Metal RestrictionsFuture Metal Restrictions

36

TodayToday’’s Metric: s Metric: Maximizing Transistor DensityMaximizing Transistor Density

Dense layout causes hot-spotsDense layout causes hotDense layout causes hot--spotsspots

37

TomorrowTomorrow’’s Metric: s Metric: Optimizing Transistor & Power DensityOptimizing Transistor & Power Density

Balanced DesignBalanced DesignBalanced Design

38

Implications to DesignImplications to Design

Design fabric will be Design fabric will be RegularRegularWill look like Will look like SeaSea--ofof--transistorstransistorsinterconnected with regular interconnected with regular interconnect fabricinterconnect fabricShift in the design efficiency metricShift in the design efficiency metric–– From From Transistor DensityTransistor Density to to Balanced Balanced

DesignDesign

39

40

Technology OutlookTechnology Outlook

Medium High Very HighMedium High Very HighVariabilityVariability

Energy scaling will slow downEnergy scaling will slow down>0.5>0.5>0.5>0.5>0.35>0.35Energy/Logic Op Energy/Logic Op scalingscaling

0.5 to 1 layer per generation0.5 to 1 layer per generation88--9977--8866--77Metal LayersMetal Layers

1111111111111111RC DelayRC Delay

Reduce slowly towards 2Reduce slowly towards 2--2.52.5<3<3~3~3ILD (K)ILD (K)

Low Probability High ProbabilitLow Probability High ProbabilityyAlternate, 3G etcAlternate, 3G etc

128

1111

20162016

High Probability Low ProbabilitHigh Probability Low ProbabilityyBulk Planar CMOSBulk Planar CMOS

Delay scaling will slow downDelay scaling will slow down>0.7>0.7~0.7~0.70.70.7Delay = CV/I Delay = CV/I scalingscaling

256643216842Integration Integration Capacity (BT)Capacity (BT)

88161622223232454565659090Technology Node Technology Node (nm)(nm)

20182018201420142012201220102010200820082006200620042004High Volume High Volume ManufacturingManufacturing

41

ReliabilityReliability

Soft Error FIT/Chip (Logic & Mem)

0

50

100

150

180

130 90 65 45 32 22 16

Rel

ativ

e

~8% degradation/bit/generation

Time dependent device degradation

0

1

1 2 3 4 5 6 7 8 9 10Time

Ion

Rel

ativ

e

Burn-in may phase out…?

1

10

100

1000

10000

180 90 45 22

Jox

(Rel

ativ

e)

Hi-K?

Extreme device variations

0

50

100

100 120 140 160 180 200Vt(mV)

Rel

ativ

e

Wider

42

Implications to ReliabilityImplications to Reliability

Extreme variations (Static & Dynamic) Extreme variations (Static & Dynamic) will result in unreliable componentswill result in unreliable componentsImpossible to design reliable system Impossible to design reliable system as we know todayas we know today––Transient errors (Soft Errors) Transient errors (Soft Errors) ––Gradual errors (Variations)Gradual errors (Variations)––Time dependent (Degradation)Time dependent (Degradation)

Reliable systems with unreliable components —Resilient µArchitectures

Reliable systems with unreliable components Reliable systems with unreliable components ——Resilient Resilient µµArchitecturesArchitectures

43

Implications to TestImplications to Test

OneOne--timetime--factory testing will be outfactory testing will be outBurnBurn--in to catch chip infantin to catch chip infant--mortality mortality will not be practicalwill not be practicalTest HW will be part of the designTest HW will be part of the designDynamically selfDynamically self--test, detect errors, test, detect errors, reconfigure, & adaptreconfigure, & adapt

44

In a NutIn a Nut--shellshell……

100 Billion

Transistors

100 BT integration capacity

20 BT unusable (variations)

10 BT will fail over time

Yet, deliver high performance in the power & cost envelope

Yet, deliver high performance in the power & Yet, deliver high performance in the power & cost envelopecost envelope

Intermittent failures

45

Summary (of Challenges)Summary (of Challenges)Near term:Near term:–– Optimum frequency & Optimum frequency & µµArchitectureArchitecture–– Lots of memory & MultiLots of memory & Multi——everywhereeverywhere–– Valued performance with higher integrationValued performance with higher integration

Paradigm shift:Paradigm shift:–– From deterministic to probabilistic design, with From deterministic to probabilistic design, with

multimulti--variatevariate optimizationoptimization–– Evolution of regular design fabricEvolution of regular design fabric

Long term:Long term:–– Reliable systems with unreliable componentsReliable systems with unreliable components–– Dynamic selfDynamic self--test, detect, reconfigure, & adapttest, detect, reconfigure, & adapt

Recommended