Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
®® 1
Microarchitecture and Microarchitecture and Design Challenges Design Challenges
for for Gigascale IntegrationGigascale Integration
Shekhar BorkarShekhar BorkarIntel Corp.Intel Corp.
December 6, 2004December 6, 2004
2
OutlineOutline
Process technology scaling & near Process technology scaling & near term challengesterm challengesµµArchitectureArchitecture & Design solutions& Design solutionsUpcoming paradigm shiftsUpcoming paradigm shiftsLong term outlook & challengesLong term outlook & challengesSummarySummary
3
Goal: 1TIPS by 2010Goal: 1TIPS by 2010
0.01
0.1
1
10
100
1000
10000
100000
1000000
1970 1980 1990 2000 2010
MIP
S
Pentium® Pro Architecture
Pentium® 4 Architecture
Pentium® Architecture
486386
2868086
How do you get there?How do you get there?How do you get there?
4
Technology ScalingTechnology ScalingGATE
SOURCE
BODY
DRAIN
Xj
ToxD
GATE
SOURCE DRAIN
LeffBODY
Lower active powerLower active powerVdd & Vt scalingVdd & Vt scaling
Faster transistor, Faster transistor, higher performancehigher performance
Oxide thickness Oxide thickness scales downscales down
Doubles transistor Doubles transistor densitydensity
Dimensions scale Dimensions scale down by 30%down by 30%
Scaling will continue, but with challenges!Scaling will continue, but with challenges!Scaling will continue, but with challenges!
5
Technology OutlookTechnology Outlook
Medium High Very HighMedium High Very HighVariabilityVariability
Energy scaling will slow downEnergy scaling will slow down>0.5>0.5>0.5>0.5>0.35>0.35Energy/Logic Op Energy/Logic Op scalingscaling
0.5 to 1 layer per generation0.5 to 1 layer per generation88--9977--8866--77Metal LayersMetal Layers
1111111111111111RC DelayRC Delay
Reduce slowly towards 2Reduce slowly towards 2--2.52.5<3<3~3~3ILD (K)ILD (K)
Low Probability High ProbabilitLow Probability High ProbabilityyAlternate, 3G etcAlternate, 3G etc
128
1111
20162016
High Probability Low ProbabilitHigh Probability Low ProbabilityyBulk Planar CMOSBulk Planar CMOS
Delay scaling will slow downDelay scaling will slow down>0.7>0.7~0.7~0.70.70.7Delay = CV/I Delay = CV/I scalingscaling
256643216842Integration Integration Capacity (BT)Capacity (BT)
88161622223232454565659090Technology Node Technology Node (nm)(nm)
20182018201420142012201220102010200820082006200620042004High Volume High Volume ManufacturingManufacturing
6
7
The The Leakage(sLeakage(s))……
1
10
100
1000
10000
30 50 70 90 110 130
Temp (C)
Ioff
(na/
u)
0.25
90
45
0.1
1
10
100
1000
0.25u 0.18u 0.13u 90nm 65nm 45nm
Technology
SD L
eaka
ge (W
atts
)
2X Tr Growth1.5X Tr Growth
90nm MOS Transistor
50nm50nm SiSi
1.2 nm SiO1.2 nm SiO22
GateGate
1.E-031.E-021.E-011.E+001.E+011.E+021.E+031.E+041.E+051.E+06
0.25u 0.18u 0.13u 90nm 65nm 45nm
Technology
Gat
e Le
akag
e (W
atts
)
1.5X2X
During Burn-in1.4X Vdd
8
Projected Power Projected Power (unconstrained)(unconstrained)
0
200
400
600
800
1000
1200
1400
90nm 65nm 45nm 32nm 22nm 16nm
Pow
er (W
), Po
wer
Den
sity
(W/c
m2 )SiO2 LkgSD LkgActive
10 mm Die
Active and Leakage power will become prohibitiveActive and Leakage power will become prohibitiveActive and Leakage power will become prohibitive
9
Product Cost PressureProduct Cost Pressure
Shrinking ASP, and shrinking $ budget for powerShrinking ASP, and shrinking $ budget for powerShrinking ASP, and shrinking $ budget for power
0
200
400
600
800
1000
1200
1400
1999 2000 2001 2002 2003 2004 2005
Des
k To
p A
SP ($
)
Source: IDC
Other
Thermal
$10 Power
Delivery
$25
Platform Budget
10
Must Fit in Power EnvelopeMust Fit in Power Envelope
0
200
400
600
800
1000
1200
1400
90nm 65nm 45nm 32nm 22nm 16nm
Pow
er (W
), Po
wer
Den
sity
(W/c
m2 )SiO2 LkgSD LkgActive
10 mm Die
Technology, Circuits, and Technology, Circuits, and Architecture to constrain the power Architecture to constrain the power
11
Between Now and ThenBetween Now and Then——
Move away from Frequency alone to Move away from Frequency alone to deliver performancedeliver performanceMore onMore on--die memorydie memoryMultiMulti--everywhereeverywhere––MultiMulti--threadingthreading––Chip level multiChip level multi--processingprocessingThroughput oriented designsThroughput oriented designsValued performance by higher level of Valued performance by higher level of integrationintegration––Monolithic & Monolithic & PolylithicPolylithic
12
Leakage SolutionsLeakage Solutions
Tri-gate Transistor
Silicon substrateSilicon substrate
1.2 nm SiO1.2 nm SiO22
GateGate
Planar Transistor
SiGeSiGe
Silicon substrate
Gate electrode
3.0nm High-k
For a few generations, then what?
13
Active Power ReductionActive Power Reduction
Slow Fast Slow
Low
Sup
ply
Volta
ge
Hig
h Su
pply
Vo
ltage
Multiple Supply Voltages
Logic BlockFreq = 1Vdd = 1Throughput = 1Power = 1Area = 1 Pwr Den = 1
Vdd
Logic Block
Freq = 0.5Vdd = 0.5Throughput = 1Power = 0.25Area = 2Pwr Den = 0.125
Vdd/2
Logic Block
Throughput Oriented Designs
14
Leakage ControlLeakage ControlBody Bias
VddVbp
Vbn-Ve
+Ve
22--10X10XReductionReduction
Sleep Transistor
Logic Block
22--1000X1000XReductionReduction
Stack Effect
Equal Loading
55--10X10XReductionReduction
15
Optimum FrequencyOptimum Frequency
Maximum performance withMaximum performance with•• Optimum pipeline depth Optimum pipeline depth •• Optimum frequencyOptimum frequency
Process Technology
02468
10
1 2 3 4 5 6 7 8 9 10Relative Frequency
Sub-threshold Leakage increases
exponentially
Pipeline Depth
02468
10
1 2 3 4 5 6 7 8 9 10Relative Pipeline Depth
Power Efficiency
Optimum
Pipeline & Performance
02468
10
1 2 3 4 5 6 7 8 9 10Relative Frequency (Pipelining)
Performance
DiminishingReturn
16
µµArchitectureArchitecture TechniquesTechniques
0%
25%
50%
75%
100%
1u 0.5u 0.25u 0.13u 65nm
Cac
he %
of T
otal
Are
a
486 Pentium®
Pentium® III
Pentium® 4
Pentium® M
Increase on-die Memory
ST Wait for Mem
MT1 Wait for MemMT2 Wait
MT3
Single ThreadSingle Thread
MultiMulti--ThreadingThreading
Full HW Utilization
Multi-threading
Improved performance, no impact Improved performance, no impact on thermals & power deliveryon thermals & power delivery
C1 C2
C3 C4
Cache
Chip Multi-processing
LargeCore
1
1.5
2
2.5
3
3.5
1 2 3 4Die Area, Power
Rel
ativ
e Pe
rfor
man
ce
Multi Core
Single Core
17
Special Purpose HardwareSpecial Purpose Hardware
TCB Exec
Core
PLL
OOO
ROM
CA
M1
TCB Exec
Core
PLL
ROB
ROM
CLB
Inputseq
Sendbuffer
2.23 mm X 3.54 mm, 260K transistors
Opportunities: Network processing enginesMPEG Encode/Decode engines, Speech engines
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1995 2000 2005 2010 2015
MIP
S GP MIPS@75W
TOE MIPS@~2W
TCP/IP Offload EngineTCP/IP Offload Engine
Si Monolithic Polylithic
CPU
SpecialHW
Memory
CMOS RF
Wireline
RF
Opto-Electronics
Dense Memory
HeterogeneousSi, SiGe, GaAs
Special purpose HW and extreme integration
18
19
Sources of VariationsSources of Variations
0
50
100
150
200
250H
eat F
lux
(W/c
m2)
Heat Flux (W/cm2)Results in Vcc variation
40
50
60
70
80
90
100
110
Tem
pera
ture
(C)
Temperature Variation (°C)Hot spots
10
100
1000
10000
1000 500 250 130 65 32
Technology Node (nm)
Mea
n N
umbe
r of D
opan
t Ato
ms
Random Dopant Fluctuations
0.01
0.1
1
1980 1990 2000 2010 2020
micron
10
100
1000
nm193nm193nm248nm248nm
365nm365nm LithographyLithographyWavelengthWavelength
65nm65nm90nm90nm
130nm130nm
GenerationGeneration
GapGap
45nm45nm32nm32nm 13nm 13nm
EUVEUV
180nm180nm
Source: Mark Bohr, IntelSub-wavelength Lithography
20
Impact of Static VariationsImpact of Static Variations
130nm
30%
5X
FrequencyFrequency~30%~30%
LeakageLeakagePowerPower~5~5--10X10X
0.90.9
1.01.0
1.11.1
1.21.2
1.31.3
1.41.4
11 22 33 44 55Normalized Leakage (Normalized Leakage (IsbIsb))
Nor
mal
ized
Fre
quen
cyN
orm
aliz
ed F
requ
ency
Today…
21
Resistor Network
4.5 mm
5.3
mm
Multiplesubsites PD & Counter Resistor
Network
CUT Bias Amplifier
Delay
Die frequency: Min(F1..F21)Die power: Sum(P1..P21)
Technology 150nm CMOSNumber of subsites per die 21
Body bias range 0.5V FBB to 0.5V RBB
Bias resolution 32 mV
1.6 X 0.24 mm, 21 sites per die150nm CMOS
Adaptive Body BiasAdaptive Body Bias----ExperimentExperiment
22
Adaptive Body BiasAdaptive Body Bias----ResultsResults
0%
20%
60%
100%
Acc
epte
d di
e
noBB
100% yield
ABB
Higher Frequency
Num
ber o
f die
s
Frequency
too slow
ftarget
too leaky
ftarget
ABB
FBB RBB
Num
ber o
f die
s
Frequency
too slow
ftarget
too leaky
ftarget
ABB
FBB RBB
97% highest bin
within die ABB
For given Freq and Power densityFor given Freq and Power density•• 100% yield with ABB 100% yield with ABB •• 97% highest freq bin with ABB for 97% highest freq bin with ABB for within die variability within die variability
23
Circuit Design TradeoffsCircuit Design Tradeoffs
00.5
11.5
2
Low-Vt usagelow high
Higher probability of target frequency with:Higher probability of target frequency with:1.1. Larger transistor sizes Larger transistor sizes 2.2. Higher LowHigher Low--VtVt usageusage
But with power penaltyBut with power penalty
00.5
11.5
2
Transistor sizesmall large
powertarget frequency probability
24
Impact of Critical PathsImpact of Critical Paths
1.1
1.2
1.3
1.4
1 9 17 25# of critical paths
Mea
n cl
ock
freq
uenc
y
Clock frequency
Num
ber o
f die
s
0%
20%
40%
60%
0.9 1.1 1.3 1.5
# critical paths
With increasing # of critical pathsWith increasing # of critical paths––Both Both σσ and and µµ become smallerbecome smaller––Lower mean frequencyLower mean frequency
25
Impact of Logic DepthImpact of Logic Depth
0%
20%
40%
-16% -8% 0% 8% 16%
Delay
20%
40% NMOSPMOS
Device IONN
umbe
r of s
ampl
es (%
)
Variation (%)
NMOS Ionσ/µ
PMOS Ionσ/µ
Delayσ/µ
5.6% 3.0% 4.2%
Logic depth: 16
0.0
0.5
1.0
Logic depth
Rat
io o
fde
lay-σ
to Io
n-σ
16 49
26
0
0.5
1
1.5
Logic depthsmalllarge
frequencytarget frequency probability # uArch critical paths
0
0.5
1
1.5
less more
µµArchitectureArchitecture TradeoffsTradeoffs
Higher target frequency with:Higher target frequency with:1.1. Shallow logic depthShallow logic depth2.2. Larger number of critical pathsLarger number of critical paths
But with lower probabilityBut with lower probability
27
VariationVariation--tolerant Designtolerant Design
0
0.5
1
1.5
# uArch critical pathsless more
Balance power &
frequency with
variation tolerance
0
0.5
1
1.5
Logic depthsmalllarge
frequencytarget frequency probability
00.5
11.5
2
Transistor sizesmall large
powertarget frequency probability
00.5
11.5
2
Low-Vt usagelow high
28
Leakage PowerLeakage Power
Freq
uenc
yFr
eque
ncy DeterministicDeterministic
ProbabilisticProbabilistic10X variation 10X variation
~50% total power~50% total power
Probabilistic DesignProbabilistic Design
DelayDelayPath DelayPath Delay Pr
obab
ility
Prob
abili
tyDeterministic design techniques inadequate in the futureDeterministic design techniques inadequate in the futureDeterministic design techniques inadequate in the future
Due to Due to variations in:variations in:VVdddd, V, Vtt, and , and
TempTemp
Delay TargetDelay Target
# of
Pat
hs#
of P
aths DeterministicDeterministic
Delay TargetDelay Target
# of
Pat
hs#
of P
aths ProbabilisticProbabilistic
29
Shift in Design ParadigmShift in Design ParadigmMultiMulti--variable design optimization for:variable design optimization for:–– Yield and bin splits Yield and bin splits –– Parameter variations Parameter variations –– Active and leakage powerActive and leakage power–– Performance Performance
Tomorrow:Tomorrow:Global OptimizationGlobal Optimization
MultiMulti--variatevariate
Today:Today:Local OptimizationLocal Optimization
Single VariableSingle Variable
30
31
TodayToday’’s Freelance Layouts Freelance Layout
Vss
Vdd
OpIp
Vss
Vdd
Op
No layout restrictionsNo layout restrictionsNo layout restrictions
32
Transistor Orientation RestrictionsTransistor Orientation Restrictions
Vss
Vdd
OpIp
Vss
Vdd
Op
Transistor orientation restricted to improve manufacturing control
Transistor orientation restricted to improve Transistor orientation restricted to improve manufacturing controlmanufacturing control
33
Op
Vss
Vdd
Ip
Vss
Vdd
Op
Transistor Width QuantizationTransistor Width Quantization
34
TodayToday’’s Unrestricted s Unrestricted RoutingRouting
35
Future Metal RestrictionsFuture Metal Restrictions
36
TodayToday’’s Metric: s Metric: Maximizing Transistor DensityMaximizing Transistor Density
Dense layout causes hot-spotsDense layout causes hotDense layout causes hot--spotsspots
37
TomorrowTomorrow’’s Metric: s Metric: Optimizing Transistor & Power DensityOptimizing Transistor & Power Density
Balanced DesignBalanced DesignBalanced Design
38
Implications to DesignImplications to Design
Design fabric will be Design fabric will be RegularRegularWill look like Will look like SeaSea--ofof--transistorstransistorsinterconnected with regular interconnected with regular interconnect fabricinterconnect fabricShift in the design efficiency metricShift in the design efficiency metric–– From From Transistor DensityTransistor Density to to Balanced Balanced
DesignDesign
39
40
Technology OutlookTechnology Outlook
Medium High Very HighMedium High Very HighVariabilityVariability
Energy scaling will slow downEnergy scaling will slow down>0.5>0.5>0.5>0.5>0.35>0.35Energy/Logic Op Energy/Logic Op scalingscaling
0.5 to 1 layer per generation0.5 to 1 layer per generation88--9977--8866--77Metal LayersMetal Layers
1111111111111111RC DelayRC Delay
Reduce slowly towards 2Reduce slowly towards 2--2.52.5<3<3~3~3ILD (K)ILD (K)
Low Probability High ProbabilitLow Probability High ProbabilityyAlternate, 3G etcAlternate, 3G etc
128
1111
20162016
High Probability Low ProbabilitHigh Probability Low ProbabilityyBulk Planar CMOSBulk Planar CMOS
Delay scaling will slow downDelay scaling will slow down>0.7>0.7~0.7~0.70.70.7Delay = CV/I Delay = CV/I scalingscaling
256643216842Integration Integration Capacity (BT)Capacity (BT)
88161622223232454565659090Technology Node Technology Node (nm)(nm)
20182018201420142012201220102010200820082006200620042004High Volume High Volume ManufacturingManufacturing
41
ReliabilityReliability
Soft Error FIT/Chip (Logic & Mem)
0
50
100
150
180
130 90 65 45 32 22 16
Rel
ativ
e
~8% degradation/bit/generation
Time dependent device degradation
0
1
1 2 3 4 5 6 7 8 9 10Time
Ion
Rel
ativ
e
Burn-in may phase out…?
1
10
100
1000
10000
180 90 45 22
Jox
(Rel
ativ
e)
Hi-K?
Extreme device variations
0
50
100
100 120 140 160 180 200Vt(mV)
Rel
ativ
e
Wider
42
Implications to ReliabilityImplications to Reliability
Extreme variations (Static & Dynamic) Extreme variations (Static & Dynamic) will result in unreliable componentswill result in unreliable componentsImpossible to design reliable system Impossible to design reliable system as we know todayas we know today––Transient errors (Soft Errors) Transient errors (Soft Errors) ––Gradual errors (Variations)Gradual errors (Variations)––Time dependent (Degradation)Time dependent (Degradation)
Reliable systems with unreliable components —Resilient µArchitectures
Reliable systems with unreliable components Reliable systems with unreliable components ——Resilient Resilient µµArchitecturesArchitectures
43
Implications to TestImplications to Test
OneOne--timetime--factory testing will be outfactory testing will be outBurnBurn--in to catch chip infantin to catch chip infant--mortality mortality will not be practicalwill not be practicalTest HW will be part of the designTest HW will be part of the designDynamically selfDynamically self--test, detect errors, test, detect errors, reconfigure, & adaptreconfigure, & adapt
44
In a NutIn a Nut--shellshell……
100 Billion
Transistors
100 BT integration capacity
20 BT unusable (variations)
10 BT will fail over time
Yet, deliver high performance in the power & cost envelope
Yet, deliver high performance in the power & Yet, deliver high performance in the power & cost envelopecost envelope
Intermittent failures
45
Summary (of Challenges)Summary (of Challenges)Near term:Near term:–– Optimum frequency & Optimum frequency & µµArchitectureArchitecture–– Lots of memory & MultiLots of memory & Multi——everywhereeverywhere–– Valued performance with higher integrationValued performance with higher integration
Paradigm shift:Paradigm shift:–– From deterministic to probabilistic design, with From deterministic to probabilistic design, with
multimulti--variatevariate optimizationoptimization–– Evolution of regular design fabricEvolution of regular design fabric
Long term:Long term:–– Reliable systems with unreliable componentsReliable systems with unreliable components–– Dynamic selfDynamic self--test, detect, reconfigure, & adapttest, detect, reconfigure, & adapt