Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
ASPDAC, Shanghai, 2005
Design at the End of theSilicon Roadmap
Design at the End of theSilicon Roadmap
Jan M. RabaeyDirector Gigascale Silicon Research Center
Co-Director Berkeley Wireless Research Center
University of California at Berkeley
2
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
In Memory of Prof. Donald O. Pederson
Donald O. Pederson, professor emeritus of electrical engineering and computer sciences at the University of California, Berkeley, whose vision laid the groundwork for advances in the design of the complex integrated circuits that drive modern electronic devices, passed away on Saturday December 25 at the age of 79.
Don is perhaps best known in the field of EDA for spearheading the development of a groundbreaking integrated circuit computer simulation program called SPICE more than three decades ago. Even more so, he has been a source of inspiration and a mentor for all of us.
3
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
From a history book in the distant future…
PrePre--siliconsilicon SiliconSilicon PostPost--siliconsilicon
19501950’’ss 20202020’’ss
The silicon age:A period of unprecedented growth and
prosperity, triggered by a ruling of a senior wizard named Gordon Moore stating that
silicon devices would multiply exponentially …
The silicon age:A period of unprecedented growth and
prosperity, triggered by a ruling of a senior wizard named Gordon Moore stating that
silicon devices would multiply exponentially …
4
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
The Silicon Age ― A Closer Look
19501950 19601960 19701970 1990199019801980 20002000
Co
mp
lexi
tyC
om
ple
xity
???
Some structureSome structure
StructuredStructured
UnstructuredUnstructured
CustomCustom
ASICASIC
DiscreteDiscrete
IP/SocIP/Soc
1 Billion Transistors1 Billion Transistors
5
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
The Challenges of the Next Decade(s)
•The Bifurcation of the Market
•The Design Introduction Challenge
•The Physics and Manufacturing Challenges
6
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
The Bifurcation of the Market
The Core:Performance is premium
Power and cost constrained
Relatively long life-time
The Core:Performance is premium
Power and cost constrained
Relatively long life-time
The Expanding Periphery:Cost and size are premium
Integration and power are key
“Just enough performance”
Short to very short lifetime
The Expanding Periphery:Cost and size are premium
Integration and power are key
“Just enough performance”
Short to very short lifetime
7
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Diverging Roadmaps?
High-PerformanceHigh-PerformanceMixed-signalMixed-signal
RFRF Low-PowerLow-Power
PackagingPackaging
MEMSMEMS
Low-standbyLow-standby
8
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
105105
9090
7575
6060
4545
3030
1515
2.52.5 3.53.5 55 1010 1515 2020
Gate Count in Millions of Gates
Eng
inee
ring
Eng
inee
ring
Man
-Yea
rs
Design Cost: ComplexityTechnology Node [nm]
130 nm
90 nm90 nm
and DSM
The Design Introduction Challenge
Source: Xilinx and Synopsys, Inc
Mask Cost
0
500
1000
1500
2000
2500
1996 1998 2000 2002 2004 2006 2008
Year
Co
st [
in $
1000
] 45nm
65nm
90nm
0.13 µm
0.18 µm
0.25 µm
9
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
The Physics and Manufacturing Challenges
•• Power and energy as a limiting factor to integration• Nano-scaling leads to uncertaintyand reduces reliability• Mixed-signal design under severe stress• True embedded systems integration requires mixed-everything
10
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Power and Energy Limiting Integration
1
10
100
1000
Active power density: k1.7
Leakage power density: k3.4
2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
Compute density: k3
2003 ITRS – Low operating power scenario
• WRONG: stop voltage scaling
• PLAUSIBLE: slow down compute density increase
• WRONG: stop voltage scaling
• PLAUSIBLE: slow down compute density increase
11
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Power and Energy Limiting Integration
Slow down compute density increase– Use slack to control leakage and power– Minimize connections to Vdd and GND!
1
10
100
2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
Active power density: <k0.7
Compute density: k2
Leakage power density: <k1.4
But … signal integrity and reliability issuesBut … signal integrity and reliability issues
12
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Variations Becoming Pronounced
0.01
0.1
1
1980 1990 2000 2010 2020
micron
10
100
1000
nm
193nm193nm248nm248nm
365nm365nmLithographyLithographyWavelengthWavelength
65nm65nm90nm90nm
130nm130nm
GenerationGeneration
GapGap
45nm45nm
32nm32nm13nm 13nm EUVEUV
180nm180nm
Design becoming “statistical”• makes verification substantially harder• challenging synchronization strategies• “error-free” design untenable
Courtesy: Shekhar Borkar, Intel
XY 40
50
60
70
80
90
100
110
Tem
per
atu
re (
C)
130nm
30%
5X
0.90.9
1.01.0
1.11.1
1.21.2
1.31.3
1.41.4
11 22 33 44 55Normalized Leakage (Isb)Normalized Leakage (Isb)
No
rmal
ized
Fre
qu
ency
No
rmal
ized
Fre
qu
ency
13
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
The Reliability Challenge
Errors can and will happen• Voltage scaling, reduced noise margins and single-event upsets
make errors unavoidable• Complexity and DSM makes “full design-time verification” impossible• Using only “physical layer” solutions yields unacceptable overhead
VddVdd
GNDGND
Courtesy IBMCourtesy IBM
14
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Mixed Signal Getting Squeezed
• Reduced headroom challenges traditional mixed-signal design
• Process variation makes design centering tough
• Does further scaling help?
Courtesy: R. Rutenbar, CMUCourtesy: R. Rutenbar, CMU
15
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
A Roadmap for Late-Silicon Age Design
20052005 20102010 The far beyondThe far beyondBeyondBeyond
Co
mp
lexi
tyC
om
ple
xity
20002000
Concurrency
And Flexibility
Concurrency
And Flexibility
Self-AdaptivitySelf-Adaptivity
EmbracingRandomnessEmbracing
Randomness
Error-resiliencyError-resiliency
Fully structured and regular fabrics
16
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
A Roadmap for Late-Silicon Age Design
• Regularity and Structure
• Concurrency, Heterogeneity and Flexibility
• Self-Adaptivity
• Error-Resiliency
• Embracing Randomness
Increasing necessity
Increasing necessity
Absolutely required for manufacturabilityDriven by photo-lithography and eventually self-assembly constraints
Also for variability, reliability, and time-to-market
Regular implementation fabricsRegular implementation fabrics
17
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Regular Fabrics – A Plethora of Choices
FPGAFPGA
VPGACMU
VPGACMU
River PLABerkeley
River PLABerkeley
Structured ASIC (e.g. LSI RapidChip)Structured ASIC (e.g. LSI RapidChip)
Trade-off between area, performance, power and
time-to-market (factors 5 to 10)
TradeTrade--off between area, off between area, performance, power and performance, power and
timetime--toto--market market (factors 5 to 10)(factors 5 to 10)
18
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Dealing with Mixed-Everything through Packaging
Advanced 2.5 – 3D packaging the only real answer to the diverging roadmap challengeAdvanced 2.5 – 3D packaging the only real answer to the diverging roadmap challenge
Courtesy: Berkeley PicoradioCourtesy: Berkeley Picoradio
19
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
A Roadmap for Late-Silicon Age Design
• Regularity and Structure
• Concurrency, Heterogeneity and Flexibility
• Self-Adaptivity
• Error-Resiliency
• Embracing Randomness
Immediate future
Immediate future
Concurrency and heterogeneity::• Driven by power density concerns• Alternative to higher clock frequencies
Flexibility:• Higher re-use, shorter time-to-market, in- field adaptation and upgrade
20
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
The Age of Concurrency and Flexibility
Xilinx Vertex 4
AMD Dual Core Microprocessor
Heterogeneous concurrency can already be found in application areas ranging from wireless, automotive, consumer, media
processing, graphics and gaming
Heterogeneous concurrency can already be found in application Heterogeneous concurrency can already be found in application areas ranging from wireless, automotive, consumer, media areas ranging from wireless, automotive, consumer, media
processing, graphics and gamingprocessing, graphics and gaming
Berkeley Pleiades
ARMARMARM
Heterogeneousreconfigurable
fabric
HeterogeneousHeterogeneousreconfigurablereconfigurable
fabricfabric
NTT Video codecwith 4 Tensilica coresNTT Video codecNTT Video codecwith 4 with 4 TensilicaTensilica corescores
21
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Convergence of Computing Platforms
General processor cores
Generic computational functions
Lightweight access to other engines
Acceleration logic
Application specific logic
Memory system
Data delivery to processors
Application processors
Lightweight compute engines
Courtesy: W.M. Hwu, Illinois and K. Keutzer, UCBCourtesy: W.M. Hwu, Illinois and K. Keutzer, UCB
CommunicationCommunication networknetwork
22
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Managing Flexibility and Concurrency• Easier to create heterogeneous concurrency than to use it!
• Dramatically more SOC programmers than designers
XScaleCore
HashEngine
Scratch-pad
SRAM
RFIFO
Microengine
Microengine
Microengine
Microengine
Microengine
Microengine
Microengine
Microengine
Microengine
Microengine
Microengine
Microengine
Microengine
Microengine
Microengine
Microengine
QD
RS
RA
M
QD
RS
RA
M
QD
RS
RA
M
QD
RS
RA
M
QD
RS
RA
M
QD
RS
RA
M
QD
RS
RA
M
QD
RS
RA
M
RD
RA
M
RD
RA
M
RD
RA
M
RD
RA
M
RD
RA
M
RD
RA
M
PC
I
CSRs
TFIFO
SPI4 / C
SIX
Handel-CHandelHandel--CCJavaJavaJava
VHDLVHDLVHDL
System CSystem CSystem CCCC
?Capturing the application
PartitioningRun-time management
Verification
23
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Some Answers
A new generation of parallelizing compilersBased on proactive memory
management
A new generation of parallelizing compilersBased on proactive memory
management
ACC
LOCALMEMORY
ACC
MA
INM
EMO
RY
GPP
MTM
ACC
LOCALMEMORY
ACC
LOCALMEMORY
ACC
MA
INM
EMO
RY
GPP
MTM
ACC
LOCALMEMORY
Platform-based design ==Restrict the Choices
Platform-based design ==Restrict the Choices
Texas Instruments
OMAP
Unified models of computation and higher abstraction levels
Unified models of computation and higher abstraction levels
Un
ifie
d
Mo
del
-of-
Co
mp
uta
tio
n
Fo
rmal
Sem
anti
cs
Sep
arat
ion
of
Co
nce
rns
Map
pin
g F
un
ctio
nto
Arc
hit
ectu
re
MetropolisMetropolis
MethodologiesMethodologiesToolsTools
Un
ifie
d
Mo
del
-of-
Co
mp
uta
tio
n
Fo
rmal
Sem
anti
cs
Sep
arat
ion
of
Co
nce
rns
Map
pin
g F
un
ctio
nto
Arc
hit
ectu
re
MetropolisMetropolisMetropolisMetropolis
MethodologiesMethodologiesToolsTools
MethodologiesMethodologiesToolsTools
Courtesy: Wen-Mei Hwu, Illinois, and Alberto Sangiovanni-Vincentelli, UCB
24
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
A Roadmap for Late-Silicon Age Design
• Regularity and Structure
• Concurrency, Heterogeneity and Flexibility
• Self-Adaptivity
• Error-Resiliency
• Embracing RandomnessLater this decade
Later this decade
Self-tuning Architectures• On chip-test used to correct for process variations through Vdd / Vt control• Static and dynamic
25
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Test Moving On-Line
• On-chip resources used to minimize test cost • Also available for dynamic re-evaluation and adaptation
On-chip noise samplersOn-chip noise samplers
BusInterface Master Wrapper
Low-CostTester
On-ChipMemory
Diag. test program
Responsemap
VCI
On-chip Bus
00001100000000000000000000000000000000100000000000100110000000001100010000000000111111111111111111111111111111110000000000000000
Logic failure map
CPU
26
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Already Adopted in Mixed-Signal
Injection-locked RF Transmitter for
Wireless Sensor Networks (Y.H. Chee,
Berkeley)
LCLCRef Osc
LC Osc
Data
Frequency calibration
“Mostly-digital” high-performance A/D converter (Yun, Berkeley, ISSCC 04)
VinDigitalFilterDigitalFilter Dout
S/H
/n
Coeff.UpdateCoeff.
Update
fclk/n
fclkClockDist.ClockDist.
AnalogAnalog DigitalDigital
PipelineADC
Σ/∆ADC
27
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Adaptive Biasing Using On-Line Test
5
10
15
20
25
30
35
40
45
50
1.0E+03 1.0E+04 1.0E+05 1.0E+06 1.0E+07
Path Delay (ps)
Esw
itch
ing
(fJ) Adaptive Tuning
Worst Case, w/o Vth tuningNominal, w/ Vth tuning
Energy-performance trade-off
ModuleTest
Module
Vbb
Test inputsand responses
Tclock
Vdd
Dynamically adjust supply and threshold design parameters to center the design in the presence of process variations!
Courtesy: K. Cao, Berkeley
10x
28
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Adaptive (Body) Biasing Impact
Courtesy: P. Gelsinger and S. Borkar, Intel (DAC04)
4.5 mm
5.3
mm
Multiplesubsites
4.5 mm
5.3
mm
Multiplesubsites
4.5 mm
5.3
mm
Multiplesubsites
4.5 mm
5.3
mm
Multiplesubsites
29
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
A New Perspective on Power Distribution
Power Domains:Similar to “clock domains”, but extended to active supply and threshold voltage management and full power-down.
Power source
Active Power NetworkActive Power Network
Load Load Load
Power source
Active Power NetworkActive Power Network
Load Load Load
Power distribution backplane including regulation and conversion using 2.5D integration(distribution network, regulators, decoupling caps)
LocalLocal
voltagevoltage
regulationregulation
High voltage (5V)
distribution
High voltage (5V)
distribution Power distribution diePower distribution die
LocalLocal
voltagevoltage
regulationregulation
High voltage (5V)
distribution
High voltage (5V)
distribution Power distribution diePower distribution die
30
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
A Roadmap for Late-Silicon Age Design
• Regularity and Structure
• Concurrency, Heterogeneity and Flexibility
• Self-Adaptivity
• Error-Resiliency
• Embracing Randomness Beyond 2010
Beyond 2010Redundancy GaloreThe only way to provide true error-resiliency!
With billions of transistors, overhead factors of 2 to 3 are reasonable if leading to 100% yield, supreme performance, or new applications.
31
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Already Common in Memories
Courtesy: H. Qin, Berkeley
Forward error correctionForward error correction
Memory
array
Memory
array
Column decoderColumn decoderR
ow d
ecod
erR
ow d
ecod
er
Redundant rows
Red
unda
nt c
olum
ns
Redundancy to increase yieldRedundancy to increase yield
DRV Spatial Distribution (256*128 Cells)
150160170180190200210220230240
1 2 3 4Errors corrected
DRV
with
ECC
(mV
)020406080100120140160180
Area
ove
rhea
d (%
)
May also lead toreduced operating voltage
May also lead toreduced operating voltage
32
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
And in Mixed-Signal Circuits
Pipelined A/D ConverterPipelined A/D Converter
UCB EECS 247 Course NotesUCB EECS 247 Course Notes
Flash ADCWith Redundancy
Flash ADCWith Redundancy
Courtesy: M. Flynn, MichiganCourtesy: M. Flynn, Michigan
33
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Self-Correcting Systems
• Designs that detect and correct errors
• Careful use of redundancy and error correction
• Provide reliable computation layered on un-reliable fabrics (as in the communications world)
Courtesy: N. De Micheli, Stanford and R. Wang, Berkeley
inputs
Faulty FSM
F
ctrl signals
outputs
Error Detector/Compensator
Soln X
Specinputs
Faulty FSM
F
ctrl signals
outputs
Error Detector/Compensator
Soln X
Spec
34
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
A Gradual Introduction Process
A “pseudo-synchronous”approach to address process variations and power minimization with minimal overhead by combining circuit and architectural techniques
Courtesy: T. Austin, D. Blaauw, MichiganCourtesy: T. Austin, D. Blaauw, Michigan
Example: Shaving voltage margins with “Razor”Example: Shaving voltage margins with “Razor”
recover
IF
Raz
or F
F
ID
Raz
or F
F
EX
Raz
or F
F
MEM(read-only)
WB(reg/mem)
errorbubble
recover recover
Raz
or F
F
Stab
ilizer
FF
PC
recover
flushID
bubble
errorbubble
flushID
errorbubble
flushID
FlushControl
flushID
error
recover
IF
Raz
or F
FR
azor
FF
ID
Raz
or F
FR
azor
FF
EX
Raz
or F
FR
azor
FF
MEM(read-only)
WB(reg/mem)
errorbubble
recover recover
Raz
or F
FR
azor
FF
Stab
ilizer
FF
Stab
ilizer
FF
PCPC
recover
flushID
bubble
errorbubble
flushID
errorbubble
flushID
FlushControl
flushID
error
“razored pipeline”“razored pipeline”
Shadow Latch
Error_L
Errorcomparator
clk_del
FF
clk
QD
Processor
Total
Optimal Voltage
RecovEnergy
Supply Voltage
Ene
rgy
Processor
Total
Optimal Voltage
RecovEnergy
Supply Voltage
Ene
rgy
35
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
• Core function validated by checker
• Checker relaxes burden of correctness on core processor
• Core does the heavy lifting, removes hazards that could slow the simple checker
speculativeinstructions
in-orderwith PC, inst,inputs, addr
IF ID REN REG
EX/MEM
SCHEDULER CHK CT
Performance Correctness
Core Checker
Courtesy: Todd Austin, Univ. of Michigan
205 mm2
Alpha 21264REMORAChecker
12 mm2
Self-checking processor
Architectural Error CorrectionOn-line Verification of Complex Processors
36
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Raising the Bar: Algorithmic “Voltage Shaving”
][nx][nyaMain Block
Estimator
][ˆ ny| | >Th
][nye
Energy savings
Voltage
Pow
er
Pmain
PTOT
PEC
1.0
1.0
Courtesy: N. Shanbhagh, IllinoisCourtesy: N. Shanbhagh, Illinois
Voltage overscale Main Block.
Correct errors using Estimator.
Power savings ≥ 3X!
Voltage overscale Main Block.
Correct errors using Estimator.
Power savings ≥ 3X!
37
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Towards malleable, resilient architectures
The Quest: Scaleable (hard and soft) architectures that provide flexible redundancy to accommodate systematic and random, static and dynamic errors while avoiding brittleness!
38
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
A Roadmap for Late-Silicon Age Design
• Regularity and Structure
• Concurrency, Heterogeneity and Flexibility
• Self-Adaptivity
• Error-Resiliency
• Embracing Randomness
The Far Beyond
The Far Beyond
Maintaining a purely deterministic Boolean abstraction ultimately becomes untenable! Maintaining our abstractions == Slowly abandon them !!
39
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
The Search for (New) Scaleable and Stackable Abstractions
An Interesting Case Study:The “Neural Network” MOCProperties:Properties:• Works well on noisy signals• Uses “soft” decisions • Operates in the presence of failures of components and interconnections
Challenge: Limited scopeWorks mostly for classification problems
Artificial neuronArtificial neuron
Allow devices to make errorsand use models-of-computation that tolerate them
(signal processing, communication, coding, information theory)
40
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Transitioning to the Post-Silicon Age
Implementation platforms that work under very low SNR, are non-deterministic, unpredictable and unreliable…
Molecular
Organic
NanoOptics
Nanotube
41
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Some Concluding Remarks
Formidable challenges over the next decades to dramatically alter design paradigms
Power density to constrain compute density
Variability and reliability to lead to novel micro-architectures and computational models
Regularity and redundancy central tenets
Diverging technology roadmaps forces packaging solutions
The opportunity: new emerging application paradigms
Ubiquitous computing paradigm inherently is error-tolerant and redundant
Periphery more amenable to “perceptual” computing
Formidable challenges over the next decades to dramatically alter design paradigms
Power density to constrain compute density
Variability and reliability to lead to novel micro-architectures and computational models
Regularity and redundancy central tenets
Diverging technology roadmaps forces packaging solutions
The opportunity: new emerging application paradigms
Ubiquitous computing paradigm inherently is error-tolerant and redundant
Periphery more amenable to “perceptual” computing
42
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Thank you!
"Chaos at least has an open architecture. Chaos has always been the native home of the infinitely possible.” ― John Perry Barlow
The contributions of all the GSRC faculty to this presentation are greatly appreciated, so is the funding by the MARCO member companies and the US Government.