Upload
vanthuan
View
221
Download
4
Embed Size (px)
Citation preview
Thermal-Aware 3D IC Physical Designand Architecture Exploration
Jason Cong & Guojie LuoUCLA Computer Science Department
http://cadlab.cs.ucla.edu/~cong
Supported by DARPA
2
Outline
Thermal-Aware 3D IC Physical Design Flow (Joint work
with IBM and PennState)
Thermal Models and Assumptions
3D Routing with Thermal Via Planning
3D Placement
3D Floorplanning
3D applications (Joint work with Glenn Reinman)
3D Architecture Exploration
Summary
3
3D Physical Design Flow (IBM, UCLA, and PSU)
Tech. Lib
Ref. Lib
Design
3D OA
Thermal-Driven 3D
Floorplanner
Thermal-Driven 3D
Placer
3D Global Router
Thermal-Via PlannerTier
Export
Tier
Import
Detailed Routing
by Cadence Router2D OA
Layer &
Design Rules
(LEF)
Cell & Via*
definitions
(LEF)
Netlist
(HDL or DEF)
3D RC extraction
Timing
Interface
3D DRC & 3D LVS
Layout
(GDSII)
EinsTimer
PSU UCLA
4
Rlateral
Thermal Resistive Network [Wilkerson04]
Circuit stack partitioned into tiles
Tiles connected through thermal resistances
Lateral resistances: fixed
Vertical resistances 1/#via
Heat sources modeled as current sources
Current value = power
Heat sinks modeled as ground nodes
(a) Tiles stack
array(b) Single tile
stack
P1
R2
R3
R4
P4
P3
P2
R1
1
2
3
4
-
R5
P55
Accurate and slow
5
Thermal Resistive Chain Model
One-Dimension Heat Flow Analysis Elmore delay-like formula [Chiang01]
4
1i
4
ij
ji4 PRT )(
4
1i
i
1j
ji4 RPT )(P1
R2
R3
R4
P4
P3
P2
R1
1
2
3
4
-
Fast and rough
Reduce R: thermal via insertion (routing)
Permute P: floorplanning
6
Multilevel TS-Via Planning and 3D Routing (TMARS)
Gi
G0
Gk
G0
Gi
Downward PassUpward Pass
level 0
level i
level k
level i
level 0
(1). Power Density Calculation
(2). Heat Flow Estimation
(3). Routing Resource Estimation
(1). Power Density Coarsening
(2). Heat Flow Estimation
(3). Routing Resource Coarsening
(1). Init Routing Tree Generation
(2). TTS Via Planning
(3). TTS Via Number Adjustment
(1) Routing Refinement
(2). TTS Via Planning
(3). TTS Via Number Adjustment
Thermal Resistive
Network Model
ASPDAC’05
11
Experimental Results Temperature Reduction
With thermal via insertion, temperature can be reduced to the
required temperature (77oC)
Thermal via insertion can reduce the maximum on-chip
temperature by over 40%
0
50
100
150
200
250
300
350
T (C)
ami33 ami49 n100 n200 n300
input
after routing
with thermal via insertion
12
Temperature Maps of ami33 Top Layer
157-158
156-157
155-156
154-155
153-154
152-153
76-77
75-76
74-75
73-74
72-73
71-72
70-71
69-70
68-69
67-68
66-67
65-66
64-65
63-64
Before Thermal Via Insertion After Thermal Via Insertion
3D Placement Problem
Problem Formulation
Minimize
• WL(x,y,z) + viaCost(x,y,z)
Subject to
• Overlap-free condition
WireLength (WL)
Bounding box model
Via Cost (viaCost)
Area consumption
Density congestion
Possible Layout
Placement Model
13
netc1
c2
c3
via
Tier 0 >
Tier 1 >
Tier 2 >
Tier 3 >
WLc1
c2
c3
viaCostPlacement
Region
3D Placement Algorithms
2D to 3D transformation by folding/stacking
3D placement by nonlinear optimization
3D Placement [ASPDAC’07]
1. 2D placement on area K*A
For 3D chip with K device
layers and each with area A
2. Shrink:
3. Tetris-style 3D legalization
Cost R = αd + βv + γt
Minimize displacement, #via
and thermal cost
15
)K/y,K/(x)y,(x iiii
2D to 3D Transformation by Local Stacking
– leveraging the best 2D placers (e.g. mPL6)
2D to 3D Transformation by Folding
Layer assignment and location mapping
according to the folded order
Folding-2
Folding-4
16
Window-based Stacking / Folding
1. Divde 2D placement into NxN windows
2. Apply stacking or folding in a window
Effect of stacking or folding would be
spreaded out, and trade-offs are
achieved by varying N
3D Placement Results (1/2)
Wirelength (stacking)
compared to 2D mPL5
Wirelength v.s. # TS via
trade-offs
circuit 2D mPL5 T3Place
ibm01 5.19E+06 2.51E+06
6.95E+06
ibm03 1.37E+07 6.67E+06
ibm02 1.44E+07
8.21E+06
ibm05 4.23E+07 1.94E+07
ibm04 1.67E+07
1.09E+07
ibm07 3.73E+07 1.90E+07
ibm06 2.20E+07
1.98E+07
ibm09 3.46E+07 1.78E+07
ibm08 3.94E+07
3.61E+07
ibm11 5.02E+07 2.51E+07
ibm10 6.82E+07
3.78E+07
ibm13 6.58E+07 3.30E+07
ibm12 7.58E+07
7.40E+07
ibm15 1.65E+08 8.42E+07
ibm14 1.42E+08
1.06E+08
ibm17 3.05E+08 1.60E+08
ibm16 2.04E+08
1.28E+08
avg. 1 0.5
ibm18 2.43E+08
0.00E+00
1.00E+04
2.00E+04
3.00E+04
4.00E+04
5.00E+04
6.00E+04
7.00E+04
8.00E+04
2.00E+07 2.50E+07 3.00E+07 3.50E+07 4.00E+07 4.50E+07
wirelength
number of TS vias
folding + 7(a)
stacking 7(a)
folding+7(b)
stacking + 7(b)
1 1
2
2
2
2
32
folding + sequential
stacking + sequential
folding + symmetric
stacking + symmetric
18
UCLA VLSICAD LAB 19
3D Placement Results (2/2)
LST, r = 10%, LST, r = 10%, w/ temp optimization
circuit Temp. (ºC) WL via # Temp. (ºC)
ibm01 276.5 2.81E+06 19020 159.8
ibm03 196.7 7.13E+06 31780 121.6
ibm04 159.6 9.11E+06 40219 96.0
ibm06 160.4 1.23E+07 50576 103.5
ibm07 107.5 2.01E+07 69111 66.4
ibm08 97.7 2.05E+07 75397 63.2
ibm09 96.1 1.94E+07 78102 60.6
ibm13 249.3 3.47E+07 127520 156.2
ibm15 136.5 8.58E+07 260681 90.1
ibm18 89.4 1.31E+08 332012 58.7
Avg. 1.0 1.08 1.06 0.63
Effect of temperature optimization
3D Placement by Nonlinear Optimization
Problem Formulation
Minimize
• WL(x,y,z) + viaCost(x,y,z)
Subject to
• Overlap-free condition
Relaxed Placement Model Placement Model
20
WLc1
c2
c3
viaCost
WLc1
c2
c3
viaCost
Placement
Region
Preliminary Results
21
Tradeoffcurvecomparedwith[ASPDAC’07] on ibm01
Achieve as large as 50% #TSV reduction
or 12% WireLength reduction
22
3D Floorplanning with Folded Blocks [ICCD’07]
The exploration of the use of vertical integration on
microprocessor design requires consideration for both
physical design and architecture.
True 3D packing
Architectural Alternative Selection
• The number of layers in folded blocks
• The partition way: block folding or port partitioning
3D Architectural Blocks – Issue Queue
Block folding
Fold the entries and place them
on different layers
Effectively shortens the tag lines
Port partitioning
Place tag lines and ports on
multiple layer, thus reducing
both the height and width of the
ISQ.
The reduction in tag and
matchline wires can help reduce
both power and delay.
Benefits from block folding
Maximum delay reduction of
50%, maximum area
reduction of 90% and a
maximum reduction in
power consumption of 40%
(a) 2D issue queue with 4 taglines;
(b) block folding; (c) port partitioning
3D Architectural Blocks – Caches
Port PartitioningWordline FoldingSingle Layer Design
3D-CACTI: a tool to model 3D cache for area, delay and power
We add port partitioning method
The area impaction of vias
Improvements
Port folding performs better than wordline folding for area.(72% vs 51%)
Wordline folding is more effective in reducing the block delay (13% vs 5%)
Port folding also performs better in reducing power (13% vs 5%)
25
Outline
Thermal-Aware 3D IC Physical Design Flow (Joint work
with IBM and PennState)
Thermal Models and Assumptions
3D Routing with Thermal Via Planning
3D Placement
3D Floorplanning
3D applications (joint work with Glenn Reinman)
3D Architecture Exploration
Summary
26
3D Architecture Evaluation with Physical Planning -- MEVA-3D [DAC’03 & ASPDAC’06]
Optimize
BIPS (not IPC or Freq)
• Consider interconnect
pipelining based on early
floorplanning for critical paths
• Use IPC sensitivity model [Jagannathan05]
Area/wirelength
Temperature
2D/3D floorplanning for
performance and thermal with
interconnect pipelining
performance simulation
with interconnect latencies
2D/3D thermal simulation
microarchitecture
configuration
target
frequency
critical architectural
paths and sensitivity
power density
estimates
estimated performance, temperature,
and interconnect data
power density with
interconnect consideration
performance, power and
temperature
ES
TIM
AT
ION
VA
LID
AT
ION
27
Design Example
An out-of-order superscalar processor micro-architecture
with 4 banks of L2 cache in 70nm technology
Critical paths
28
Performance Impact of 3D Integration
0
0.5
1
1.5
2
2.5
3
3.5
4
3G 4G 5G 6G
1 layer
2 layers
3 layers
4 layers
Over 35% performance improvement
29
5GHz 3 Device Layer Layout
Temperature Impact of 3D Integration
2D Design 3D Design
1st Layer 2nd Layer
w/o thermal via
w/ thermal via
Summary
Complete Set of Thermal-Aware 3D IC Physical Design
Tool
3D Routing with Thermal Via Planning
3D Placement
3D Floorplanning
Ongoing collaboration with IBM and PennState to include 3D
parasitic extraction, timing analysis, etc.
3D Architecture Exploration & Design Drivers
Coupled with 3D physical planning
• Consider both 3D component stacking and folding
31
3D Multicore Processor with RF-Interconnects
Three Silicon Layers
Tier 3: Cache Data
Components
Tier 2: Interconnect and
Cache Tags
Tier 1: Cores
Non-Uniform Cache Access
Cores see different latencies to different cache banks
Data can migrate among distributed caches
• Can hide latency
• Adds interconnect traffic
Heat sink