ASICS: THE HEART OF MODERN ROUTERS
Chang-Hong Wu
Distinguished Engineer, Juniper Networks
2 Copyright © 2010 Juniper Networks, Inc.
THE INTERNET EXPLOSION
Exponential growth, no matter how you measure it!
The clearest indication of value delivered to end-users
1988 1993 20081998 2003
# Connected Devices
# Web Sites
Internet Capacity
# Google Searches/Month
Total Digitized Information
33K
1B
160M
162M
100M
40M
9.5M
12EB/yr
4PB/yr
130EB/yr
60PB/yr
2.7B/mo
31B/mo
110EB
420EB
1.7M25M1
3 Copyright © 2010 Juniper Networks, Inc.
Computing
Stored
ProgramMicroprocessor Multi-core
Digital
ComputingPipelining
Storage
Digital
StorageCore
Memory
Flash
Disk DRAM
Networking
HPNPacket
SwitchingTCP/IPDigital
TransmissionCircuit
Switching
DRIVING FORCE BEHIND EXPONENTIAL GROWTH
SC
N
S
N
C
Information
System
SC
N
4 Copyright © 2010 Juniper Networks, Inc.
COMPUTER PERFORMANCE: 1988-2008
„88 „89 „90 „91 „92 „93 „94 „95 „96 „97 „98 „99 „00 „01 „02 „03 „04 „05 „06 „07 „08
22
20
24
26
28
210
212
214
216
218
220
Megahert
z /
MF
lops
222
224
System CAGR: 1.9x /year
Microprocessor CAGR: 1.3x /year
Super Computers
226
228 500,000 X over 20 years
5 Copyright © 2010 Juniper Networks, Inc.
ROUTER PERFORMANCE 1988 – 2008
22
20
24
26
28
210
212
214
216
218
220
„88 „89 „90 „91 „92 „93 „94 „95 „96 „97 „98 „99 „00 „01 „02 „03 „04 „05 „06 „07 „08
Megabits p
er
second
222
224
M40
M160
T640
TX
Pre-ASIC era: 1.6x /year
Post-ASIC era: 2.2x /year
Interface CAGR: 1.7x /year
1000,000 X over 20 years (2x /year)
T1600
6 Copyright © 2010 Juniper Networks, Inc.
SILICONTHE FOUNDATION OF PERFORMANCE
Instructions/packet
FabricEngine(F)
CoreEngine(C)
EdgeEngine(E)
ServicesEngine(S)
General-PurposeMicroprocessor(G)
1–10 10–100 100–1000 1000–100,000 100,000–∞
7 Copyright © 2010 Juniper Networks, Inc.
COMPARISON OF SILICON TECHNOLOGIES
Technology Advantages Disadvantages Use Cases
General Purpose CPUs
Very flexiblePoor performance, density, and power
Flexibility is more important
than performance
Field Programmable Gate Arrays
Smaller up-front development cost;
Field upgrades
Lower performance, density, and power; High per part price
Volume is low; Changes are
expected
Off-the-shelf Network Processors
Flexible. Jump straight into
software design
Can fall short of performance,
power, and functionality targets
Differentiation is not important
ASICsTailor to your specification
High upfront cost; Long development
cycle
High performance;
Low production cost
8 Copyright © 2010 Juniper Networks, Inc.
SYSTEM ARCHITECTURE
Market requirements
Performance, density, feature, cost targets
Software/hardware interactions
Functional partitioning
Silicon process technology evaluation
Cost/performance tradeoffs
Memory choices
Stores configuration, FIB tables, etc.
Temporary working buffers
Chip partitioning
IO and logic ratio, die size, interface simplicity
9 Copyright © 2010 Juniper Networks, Inc.
ASIC PROCESS TECHNOLOGY
Greater density allows more features/functionality for the same price
Moore‟s Law: Transistor density doubles every 18 months
– Holding up remarkably well. But how
much longer?
While density is increasing, performance is starting to level off
The decrease in operating voltage, hence dynamic power, also slowed
Static power is becoming an issue
NRE costs associated with newer processes increasing dramatically
Architectural innovations are needed to continue to provide value to customers
10 Copyright © 2010 Juniper Networks, Inc.
NETWORKING ASICS AND MEMORIES
ASICsInput /
Output
Input /
Output/
Fabric
Packet BuffersQueues /
Link Memory
Control Memories
(FIB, ACL, configs, etc.)
11 Copyright © 2010 Juniper Networks, Inc.
MEMORY TECHNOLOGY CHARACTERISTICS
Technology Capacity Frequency Latency Power Cost
Embedded SRAM
L H L M H
Embedded DRAM
M M L+ L M
Embedded TCAM
L M L+ H H
External SRAM
M L M H H
External RLDRAM
H M M L H
External SDRAM
H+ M H L L-
External TCAM
L L H H H
12 Copyright © 2010 Juniper Networks, Inc.
MEMORY CHOICES WITH NETWORKING ASICS
Packet buffering
• Need high throughput, high density
• Long bursts ok
• SDRAM or RLDRAM (Reduced Latency DRAM)
Queuing/Link memory
• Need high throughput, low latency
• Shorter bursts
• SRAM, RLDRAM, or SDRAM
Control memory
• Need high throughput, low latency
• Even smaller access quantum
• SRAM, TCAM, or RLDRAM
13 Copyright © 2010 Juniper Networks, Inc.
ARCHITECTURE – CHIP PARTITIONING
Fewer chips does not necessarily mean less overall cost
– Chips get very expensive once
they cross a certain die size
– Economics of silicon is all about
fabrication yield
Goals– Balance size of each chip within
packet forwarding engine
– Minimize pin-count on each chip
– Minimize overall component cost
– Flexibility of support different configs with the same chipset
2 chips
1 chip
Total # of transistors
$
Pins, packaging,
power, PCB area
dominate
Chip yield &
PCB layers
dominate
X depends on
technology
14 Copyright © 2010 Juniper Networks, Inc.
EXAMPLES OF SILICON PROCESS IMPROVEMENT, CHIP PARTITIONING, AND MEMORY USAGE
Trio/NISP
65nm
4 Chips
1.2Bn Transistors
604Gbps IO
RLDRAM/
DDR3 SDRAM
2009
I-Chip
90nm
1 Chip
160m Transistors
219Gbps IO
RLDRAM/
DDR2 SDRAM
2006
IP3
180nm (90nm)
10 Chips
446m Trans
412Gbps IO
SRAM/
RDRAM
(RLDRAM)
2002
IP1, 2
250nm
4 Chips
18m Trans
47Gbps IO
SRAM/
SDRAM
1998
15 Copyright © 2010 Juniper Networks, Inc.
M40 M160 T640 T1600
Slot Capacity,
Gbps3.0 10 40 100
System
Capacity 40Gbps 160Gbps 640Gbps 1600Gbps
Max System
Draw1.5 KW 3.15 KW 4.52 KW 8.35 KW
EER
(Gbps/KW) 13 25 71 96
FRS 1998 2000 2002 2007
EXAMPLES: BENEFITS OF ASIC EVOLUTIONS
16 Copyright © 2010 Juniper Networks, Inc.
MICRO ARCHITECTURE
Take each subsystem, divide into
blocks, divide each block into sub-
blocks, design down to the basic
logic elements
Document both functionality and
architecture Rigorous peer reviews of all
documents
Xchip
X_in X_out
X_in_a X_in_b
X_in_b_dpX_in_b_cntl
Control logic Datapath
17 Copyright © 2010 Juniper Networks, Inc.
REGISTER TRANSFER LEVEL CODING
Translate micro architecture for all blocks to “Register Transfer Level” code.
always @ (sel or a or b)
begin
if (sel == 1)
out = a;
else
out = b;
end
OR
assign out = sel ? a : b;
out
A large chip will have hundreds of thousands of lines of RTL code
Must always keep in mind physical placement and timing during the micro architecture phase
– You pay now or you pay more later
sel
b
a
18 Copyright © 2010 Juniper Networks, Inc.
SYNTHESIS & TIMING
Synthesis is the exercise of mapping RTL to GATES in the
technology of choice
INPUT
– RTL code
– Specification of clocks and cycle-time (frequency)
– Input and output constraints for module being synthesized
– Wire-load models as basis to model interconnect effects on gates
– Recent trends: physical synthesis
19 Copyright © 2010 Juniper Networks, Inc.
VERIFICATION
Goal: First-time-right silicon
Avoid expensive ASIC respins
Simulations are far easier to debug than real chips
Recipe: At least as many verification engineers as design
engineers per chip
Performed at multiple levels
Block level
Chip level
Sub-system level
System level
Software/hardware co-simulation
TOOLS
Test-bench tool
SystemVerilog
C/C++, Verilog
Coverage tools
Equivalency checkers
Simulators
Waveform viewers
20 Copyright © 2010 Juniper Networks, Inc.
PHYSICAL DESIGN
Power and clock planning
Perform high-level floor-planning
Place I/O, SRAMs, & Register Arrays
Random logic placements
Perform congestion analysis
Wire up all the logic and IOs
Run timing with physical placement
Many iterations of all of the above
21 Copyright © 2010 Juniper Networks, Inc.
PHYSICAL DESIGN EXAMPLE
1) Memory placement
2) Logic placement & clocks
3) M1 routing
4) M2 routing
5) M3 routing
6) M4 routing
7) M5 routing
8) M6 routing
9) M2/M4/M6 routing
10) M1/M3/M5 routing
22 Copyright © 2010 Juniper Networks, Inc.
ASIC TAPEOUT
Criteria for ASIC Tapeout
All functionality complete
All verification complete
Performance simulations meet goals
Chip is error free from a testability perspective
Chip meets timing under all process, temperature and
voltage conditions
Design and verification database is archived
23 Copyright © 2010 Juniper Networks, Inc.
MANUFACTURING
After the ASIC is taped out
Masks are generated for photolithography
ASICs are then built layer-by-layer on a silicon substrate wafer
Once the ASIC wafer is complete
Each die is tested in wafer test
Only good die are laser cut for packaging
Once cut die are available
They are put in a package
The packaged devices are then tested again
Tested packaged parts are put on system boards
Test with other hardware and software
24 Copyright © 2010 Juniper Networks, Inc.
MANUFACTURING – CONTINUED
Copper layers
Packaging
300mm wafer
300mm wafer fab
25 Copyright © 2010 Juniper Networks, Inc.
SUMMARY
ASIC technology has transformed the network industry
Silicon process technology is evolving at an impressive pace but
architectural innovations are required to keep up with the
demand for increasing performance at lower power
A rigorous architecture, design, and verification process is
required to implement complex networking ASICs
There are a vast amount of architectural and design tradeoffs to
be made so user community should provide feedbacks early and
often