Upload
lynhu
View
221
Download
3
Embed Size (px)
Citation preview
1
EE290 A: Advanced Topics in CADEE290 A: Advanced Topics in CADComponent Based DesignComponent Based Design
of Electronic Systemsof Electronic SystemsLecture 10Lecture 10
Professor Kurt Professor Kurt KeutzerKeutzer
Department of Electrical Engineering andDepartment of Electrical Engineering and
Computer SciencesComputer Sciences
University of California at BerkeleyUniversity of California at Berkeley
Spring 1999Spring 1999
2
Why components?
Why are components/IP blocks an attractive way to design
electronic systems today?
What are the alternative design methodologies?
3
NRTS: Raw Silicon Capability
1
4M
8M
16M
32M
64M
128M
256M
1997250nm
1999180nm
2001150nm
2003130nm
2006100nm
512M
200970nm
201250nm
Total microprocessor tr. Microprocessor logic tr.cm2 ASIC logic tr. cm2
24M tr/6M gates
64M tr/16M gates
4
Design Productivity
1
Log
ic T
rans
isto
rs p
er C
hip
(K
)
P
rodu
cti v
i ty
Tra
ns. /S
t aff
- M
o.
10
100
1,000
10,000
100,000
1,000,000
10,000,000
10
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000Logic Tr./ChipTr./S.M.
58%/Yr. compoundComplexity growth rate
21%/Yr. compoundProductivity growth rate
Source: SEMATECH19
81
198 3
1 98 5
1 98 7
1 98 9
1 99 1
1 99 3
1 99 5
1 99 7
1 99 9
2 00 3
2 00 1
2 00 5
2 00 7
2 00 9
xxx
x xx
x
2.5µ
.10µ
.35µ
5
NRTS: Prediction - Increasing design reuse
10%
20%
30%
40%
50%
60%
70%
1997250nm
1999180nm
2001150nm
2003130nm
2006100nm
80%
200970nm
201250nm
90%
6
Alternative: El Greco design flow
Optimized codegeneration for
SCENIC(+VHDL &Verilog)
ASIC QoRunderpinnedby Synopsys
synthesisCore vendor’sown tools
HardwareImplement-
ation
System Integration and Verification(System Test - Hardware & Software)System Integration and Verification
(System Test - Hardware & Software)
Refinement(e.g. Float to Fixed)
SoftwareImplement-
ation
System-levelHW-SW
Coverification
Models
Algorithm/ProtocolDesign/Capture/Verify
ManualPartitioning
System designer controls HW-SW codesign
- El Greco shortens the loop
C/C++
Alternative is
synthesis from a
very high-level
description -
e.g. El
Greco/Synopsys
7
DOMAIN SPECIFIC
RTL
GATE
TRANSISTOR
BEHAVIORAL
Design Productivity by Approach
a
b
s
q0
1
d
clk
GATES/WEEK(Dataquest)
100 - 200
1K - 2K
2K - 10K
8K - 12K
10 - 20
8
To Design, Implement, Verify ...
a
b
s
q0
1
d
clk
Beh
RTL
62.5
125
625
6250
62,500
10M tr/2.5M gatesStaff Months
gate
tr
400
800
4000
40,000
400,000
64M tr/16MStaff Months
150
300
1500
15,000
150,000
24M tr/6MStaff Months
9
Even for 10M transistors ...
a
b
s
q0
1
d
clk
62.5
125
625
6250
62,500
Power
Delay
Area
Beh
RTL
Implementations here are often not good enough
Because implementationshere are inferior/ unpredictable
Staff Months
10
Productivity vs. QOR
QOR -e.g. clock speed (getting better)
des
ign
tim
e (i
ncr
easi
ng
)
11
Productivity vs. QOR
QOR -e.g. clock speed (getting better)
des
ign
tim
e (i
ncr
easi
ng
)PCI interface 66MHz.
Embedded mp 200MHz.
High-end mp 800MHz.
12
After only one process generation
QOR -e.g. clock speed (getting better)
des
ign
tim
e (i
ncr
easi
ng
)PCI interface 66MHz.
Embedded mp 200MHz.
High-end mp 800MHz.
13
Moral of the Story
Each process generation will
● Make it twice as easy to realize a fixed (e.g. 66MHz.)timing spec
● Make it twice as easy to realize a fixed (e.g. 100 mm)area requirement
● Make time-to-market requirements 20% morestringent
Time is always on the side of more productive EDAtools/methodologies, but current high-productivitysynthesis methodologies are still not meeting QORrequirements
14
Why components? - summary
Increasing re-use of intellectual property
blocks/components is the consensus because:
● Moore’s law continues to provide significantly greatersilicon capability
● Significant productivity improvements are required
● Alternative methodologies are not coming forward -and the implementation quality does not meetrequirements
15
What type of components?
What type of components?
● What size of component?
● What type/capability of component?
16
Delay degradation in DSM
• With scaling processes, interconnect delay becomes adecreasing portion of stage delay, even with noise
0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.260
102030405060708090
100110120130
0102030405060708090100110120130
2-input NAND, FO = 2, W/L = 20, 50K gate block
0.250.180.130.1
Del
ay (
ps)
Process Generation (µm)
Gate delay Stage delay (w/o noise) Stage delay (w/ noise)
17
Dynamic Power Analysis, 50K blocks
Parameters: Packing density from NTRS Switching activity = 0.15
Device sizing set at W/L = 20Routing density set at 0.4 (M1/2) and 0.2 (M3/4)
• 50K blocks
• Analysis focuseson dense std. celllogic parts of anASIC
• Frequency setaccording to NTRS
• Power density isroughly constantthrough scaling0.10 0.15 0.20 0.25
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.250.180.130.1
Po
wer
Den
sity
(W
/mm
2 )
Generation (µm)
Total power density Interconnect portion
18
Dynamic Power Analysis, 100K & 200K Blocks
Parameters: Packing density from NTRS Switching activity = 0.15
Device sizing set at W/L = 20Routing density set at 0.4 (M1/2) varied from 0.2 to 0.3 (M3/4)
• Various blocksizes
• Normalized to50K gate blockperformance
• Numbers arerelativelyindependent oftechnologygeneration
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.0
1.1
1.2
1.3
1.4
1.5
1.6
200K blocks, equivalent performance
200K blocks,optimal sizing
100K blocks, equivalent performance
100K blocks,optimal sizing
Po
wer
* D
elay
(n
orm
aliz
ed)
19
Within a 50K - 100K Module
75 - 100% delay in gates3.5µ - 1.0µ1980 - 1990.18µ - 0.1µ1998 - 2005
This flowshould workOK for blocksof 50-100Kgates andshouldcontinue towork OK in thefuture
Library
RTLSynthesis
HDL
netlist
logicoptimization
netlist
physicaldesign
layout
Proper sizing within flow isabsolutely required
• typically: size down then up
• try: size up- then down
20
Is 100K too small? Look at Dec Alpha
21
Details (Alpha 21264)
U nit # A spect R atio # T ransistors
Instruction cache 1 0 .73 2 .9MIT B 1 0 .56 284kP C 1 0 .91 488kB ranch P redictor 1 0 .53 337kD ata cache 1 0 .82 2 .8MD T B 2 0 .74 419kM B ox 1 0 .61 586kL D /ST R eorder U nit 1 0 .78 612kL 2 Cache/System IO 1 0 .79 596k
In teger Exec 2 0 .75 290k2 0 .54 404k
In teger Q ueue 1 0 .5 617kIn teger Reg F ile 2 0 .91 217kIn teger M apper 1 0 .71 432k
F P d iv/sort 1 0 .57 252KF P add 1 0 .97 429kF P Q ueue 1 0 .81 515kF P R eg File 1 0 .67 296kF P M apper 1 0 .81 515kF P m ul 1 0 .61 725k
uP 24 0 .81 15.2M
A. Tabbara - UC Berkeley
*
*
22
Details of Block Size
A. Tabbara - UC Berkeley
200 tr/50K gates
400 tr/100K gates
23
Components: Next Generation SSI?
• Gary Smith, Principal Analyst at Dataquest writes:
• ‘‘The amount of confusion within the CAD community has been the big
surprise in the effort to develop System Level Integration (SLI) design
methodology. … How do you design a million gate IC ? … Fortunately
Kurt Keutzer gets it. The first thing we needed to know was the
optimum size of a basic SLI library element. The work, at Berkeley, now
tells us it’s 50,000 gates. So we have defined today’s SSI. Now the
challenge is to develop the 200 to 400 basic library elements needed to
really do SLI design. ‘’
24
Design Paradigm: Re-useable IP
Semiconductor Industry
3rd PartyDesigners
CustomerDesigns
µP DSP Encryp-tion
Reused blocks
Siliconcapabilityenablesintegration ofentire systemson a single die
Achieve required design productivityby assembling re-useableblocks of intellectual property (IP)
25
What type of component? Reuse at what level?
Entire sub-system
● e.g. MPEG
Large module
● DCT, motion estimation
Sub-module
● Filter, multiplier
Primitive component
● gate, full-adder
``We want to reuse at the highest level that we can.’’ -
Lance Mills, HP
26
Type/Functionality of Components
Video: MPEG, DVD, HDTV
Audio: MP3, voice recognition
Processors: CPUs, DSPs, Java
Networking: ATM, Ethernet,
ISDN, FibreChannel, SONET
Bus: PCI, USB, IEEE 1394
Memory: SRAM, ROM, CAM
Wireless: CDMA, TDMA
Communication: modems, transceivers
Coding: speech, Viterbi, Reed-Solomon
Display drivers/controllers: TFT
Other: sensors, encryption/decryption, GPS
Power PC core: 3.1mm2 in 0.35µ
ARM Core: 3.8 mm2 in 0.35µ
MPEG2 Decoder: ~65k gates
PCI Bus: ~8k gates
Ethernet MAC: ~7k gates (soft)
RSA Encryption: ~7k gates
Niraj Shah
27
How will components be implemented?
What are the most promising implementation media?● SW
● SW running on a standard processor - MIPS R4000● SW running on a tailored processor - TI TMS320C54● SW for a configurable processor - XTENSA● SW for an application-specific processor - C-Cube
MPEG decoder
● HW
● hard - actual layout● firm - netlist● soft - synthesizable RTL model in VHDL?/ Verilog?
The aim of this course is to make us the authority on these questions!
28
How much do we lose in hard vs soft IP?
gate array
std cell
stdcell 2 sizes
std cell6 sizes
stdcelllimitless
dynamiclibrary
RTLsyn
autoP&R
tiling
manuallayout
*
*5X speed1/4X area1/10X power
29
What do we gain with soft vs. hard IP?
FAB portability - soft, firm IP is migratable to multiple
processes
Modifiability - soft IP can be user modified (a plus?)
Process migratability - a soft, firm IP design can more
easily be migrated to the next process generation
Wider range of QOR of implementation
30
Interconnect Complexities
Interconnect effects play a major role in the increasing costs for large
hard-block design styles
Without new Circuit Fabrics and the flexibility they offer, interconnect
problems will significantly impact performance and cost for emerging IC
technologies
Occ
urre
nce
Rat
e(N
orm
aliz
ed)
sizedie
wirelength
_~0.5
Local wires
Global wires
blocks
globalwires
Pileggi - CMU
31
Technology Scaling
Block sizes cannot grow as rapidly as chip sizes since block design
becomes increasingly more difficult --- each block is a chip design over
multiple configurations
If the blocks are inflexible, the global wiring problems begin to dominate
all aspects of performance quality and system cost
Occ
urre
nce
Rat
e(N
orm
aliz
ed)
sizedie
wirelength
_~0.5
Larger chip with finer feature sizes
Pileggi - CMU
32
Softer Fabrics
With soft, flexible Fabrics, the system assembly can more thoroughly
exploit the available technology
The interconnect problem is controlled via: soft boundaries for area re-
shaping; re-synthesis and re-mapping for timing; smart wires; and top-
down specified Fabric synthesis
Occ
urre
nce
Rat
e(N
orm
aliz
ed)
sizedie
wirelength
_~0.5
Superior timing, power and cost
Pileggi - CMU
33
What methodology will prevail?
Objected oriented• Top down
• Correct by construction
• Highly sophisticated user
• Intelligence of the system is
more in the tools than the IP
• Interfaces are carefully tuned
for application
• High performance oriented
Component-oriented• Bottom up
• Robust and error tolerant
• Relatively unsophisticated user
• Intelligence is in the IP, tools are
relatively simple
• Standard ``plug and play’’ interfaces
for a broad range of applications
• High productivity oriented
34
Outline of issuesWhy components?
● Raw silicon capability
● Design productivity
What type of components?
● What size of component?
● What type/capability of component?
How will they be implmented?
● Review of implementation alternatives
What methodologies are likely to succeed?
● Object-oriented vs. component-oriented
Who are the players?
● foundries,
● fabless semiconductor, 3rd party IP providers, vertical semiconductor,
● system companies
Which design styles are likely to predominate
● Time-to market (productivity)
● Features● Process portability● In-field up-gradabilty, programmability● Quality of results
35
Interrelationship of issues
Semiconductor ProcessingCapability
System FunctionalityDesired
DesignConstraints
DesignProductivity
Business Issues
ImplementationApproach