Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
University of California • Berkeley • San Diego • Los Angeles
Puneet Gupta, UCLAAndrew B. Kahng, UCSD
Costas Spanos, UCBKameshwar Poolla, UCB
Elad Alon, UCB
Design Manufacturing Interface (DMI)
11/02/2009
IMPACT • DMI• 2
Year 1&2 Milestones
Improved compact models for device variations
PLL circuits and model development, proof-of-concept implementation–
The Effects of Variability on Phase-Locked Loops–
Variability in High-Speed Link Receivers
New interactions of layout and process–
Design Driven Process Monitoring–
Electrical Process Window
Evaluation of statistical optimization–
Yield-constrained digital circuit optimization
Incorporating random variability to DFM–
Statistically Variability and Yield Estimation via Domain Decomposition–
Modeling the Statistics of Systematic Variability–
Integrated Circuit Variability Modeling and Statistical Parameter Extraction
Design-mask interactions–
Design-Level Assessment of Overlay Impact Across DPL Technology Options–
Pattern-based Clip Clustering for Efficient Design for Manufacturing Solutions–
Timing Yield-Aware Color Reassignment and Detailed Placement Perturbation for Double Patterning Lithography
–
Single Mask Double-Pattering Lithography
10/29/2008
IMPACT • DMI• 3 4/15/2009
Studying the Power Cost of Variability
Many opportunities to handle variability at the circuit or system level
So, power cost study must be performed in context of real designs–
Only way to identify critical sources/type of variations
Focus on key, representative blocks–
Working on phase-locked loops (PLL) and high-speed serial transceivers
IMPACT • DMI• 4 4/15/2009
Icp
Vreg
up
downR
C+
-
clk
÷N
PFDref_clk
04/09/2008
Phase-Locked Loop Design
PLL is a 2nd order (or higher) feedback loop–
Variations can destabilize the loop
Classis technique: self-biasing
Relies on local matching between devices–
Increasingly unreliable in modern processes
IMPACT • DMI• 5 4/15/200904/09/2008
Digital PLL
Developed more digital PLL architecture–
Use low-overhead calibration to handle variability
–
Does not require any high-resolution (variation sensitive) TDCs
Taped out 65nm test-chip Oct. 2009
÷N
ref_clk
+-
Vref
Cdecap
LPF
ΣDCON
up/dn
fint
Vreg
Clk
PFD
BB PD
IMPACT • DMI• 6 4/15/2009
High-Speed Transceiver Design
Links operate with small signals (10’s of mV) at >5Gb/s–
Mismatch can significantly degrade performance/power
Developed model of link power vs. comparator offset–
~2x power for every 30mV of offset
Developing efficient offset cancellation schemes–
Tapeout Jan. 2010
clk
data_out+
-+
-Amplifier / Equalizer
ComparatorChannel
TX+
-+
-
didi_b
IMPACT • DMI• 7
Design Driven Process Monitoring
•
Goal: design-driven process monitoring/optimization strategies using short-loop (scribeline) test structures•
A fast estimate of the number of failing die within a wafer can be used to guide early wafer pruning Reduced BEOL processing costs.
•
Can guide speed binning•
Can be used as a guide to optimize/tweak the process for a design
11/02/2009
Using Effective Drive Currents
Few current measurements on the die will predict the timing of the circuit under
process variations
Design Dependent Ring Oscillator
Construct a minimal area ring oscillator that matches the sensitivity of the critical
path(s) to process variations
Faculty: Puneet Gupta
Aashish Pant, Tuck Boon Chan
IMPACT • DMI• 8
Preliminary Results
11/02/2009
Synthesized Ring OscillatorEffective Current Based Methodology
tscoefficien fitted
dieper measured :) ,(s variationdie within todueError E
error modeling Ieff ation,delay varict Interconne
Delayct Interconne Nominal
WD
),(),(),(
np
np
M
Var
Nom
WDMVarNom
n
fallrisepath
p
fallrisepathfallrise
path
,KK
IeffIeff
EIDID
EEIDID
IeffKn
IeffKp
delay
1
|
:Constraint
x :Minimize
1
var_
11
var_var_11
1k
n
ik
cri
icri
nn
inni
k
n
k
x
dnomd
dnomxdnomxdxdx
A
param
kdkx
kA
paramk
k
k
in n variatio
to typegate ofy sensitivitDelay typegate of instances ofNumber
typegate of Area
IMPACT • DMI• 9 11/02/2009
Electrical Process Window Chan, Tuck-BoonKagalwalla, Abde Ali
Faculty: Puneet GuptaObjective: 1. Extract delay and leakage power centric
electrical process window (EPW).2. Compare electrical process window with
geometrical process window.3. Explore methods to enlarge electrical process window
Motivation: Existing geometrical process window is tight.
Exposure
Defocus
Electrical Process window
Geometrical Process window
IMPACT • DMI• 10
Main Ideas and Preliminary Results
Typical method: Process parameters are within process window if printed shapes are within geometrical tolerances.
Alternative: Process parameters are within process window if post-litho electrical parameters are within specifications.–
Extract electrical parameters based on post-litho shapes.–
Enlarge process window during OPC or retargeting.
Preliminary Results
11/02/2009
GPW 10% tolerance
Exposure
Exposure
Exposure
Exposure
Def
ocus
Def
ocus
Def
ocus
Def
ocus
EPW delay (10% )
EPW power (200% )
EPW delay (10%) & power (200% )
Reduce the gate lengths of transistors on critical paths.
Overall EPW increased 46%
GPW vs EPW(s)
IMPACT • DMI• 11 11/02/2009
Yield-constrained digital ckt optimization
Yu Ben
Faculty: El Ghaoui, Poolla, SpanosObjective: Efficient yield-constrained optimization algorithmMotivation: Worst-case approach is pessimistic
Existing methods for yield-constrained problemcannot incorporate sophisticated variability model
Payoff: Efficient and robust ckt design without overdesignProblem formulation:
x: decision variable (gate sizing)A(x): design objective (e.g. area)Pf
: failure probabilityDcritical
: critical delayDmax
: maximum allowable delay: maximum allowable failure probability
1 Prob subject to)( minimize
maxcriticalf
xDDP
xA
IMPACT • DMI• 12
Main Ideas and Results
Main idea–
Approximate the variation by a box in parameter space–
Sequentially change the box size and solve the corresponding robust optimization problem (convex optimization problem)
–
Use fast yield simulation (importance sampling) to control the outlet of the algorithm
Results
11/02/2009
50 55 60 65 70 750
5000
10000
Delay (ps)
Cou
nts
0
500
1000
1500
2000
Opt
imiz
ed A
(x)
Worst-casePf=0.00065
Fail
Box SGPPf=0.0107
Box SGPWorst-case
10-3 10-2 10-10
10
20
30
Run
-tim
e (s
ec)
10-3 10-2 10-10
1
2
3
4
5
SG
P It
erat
ion
IMPACT • DMI• 13 11/02/2009
Faculty: Poolla, SpanosObjective: Fast computation of yield for digital cktsMotivation: Conventional methods for yield estimation
too conservative (ex: 4 corners) or too slow (ex: Monte Carlo)
Payoff: Fast accurate method would be useful foriteratively optimizing ckt with yield constraints
Problem formulation: y = quantity of interest (ex: max delay)θ
= parameters (ex: process, ckt, or design)given y = f(θ) (ex: analytical expression or SPICE code)given density function p(θ), threshold ymax
calculate Prob{ y > ymax}
Yield Estimation via Domain Decomposition
Anand Subramanian
IMPACT • DMI• 14
Problem difficulty stems from:Non-linearity in f(θ), Randomness and dimension of parameter space (θ)
Typical approach:Primitive Monte Carlo, Importance sampling Monte Carlo
Faster approach:–
Decompose domain into hyper-rectangles/ellipsoids–
Sampling only in highly nonlinear directions–
Use piecewise linear approximations of function
11/02/2009
Main Idea and Preliminary Results
IMPACT • DMI• 15
Statistics of across-wafer Variability
We need to model the wide spectrum of systematic variability observed across multitude of wafers.–
A statistical description of variability of the systematic patterns
is required.Kedar Patel
(with C. Spanos)
Goals–
Propose a model to capture statistics of systematic variability (done)–
Extend the modeling strategy to intra-die level (in progress)–
Demonstrate a use-case of the complete model using ISCAS benchmark circuits
IMPACT • DMI• 16
Modeling the Statistics of Across Wafer Shapes
IMPACT • DMI• 17
PCA-based, parsimonious model fit for across-wafer functions over a population of ~300 wafers
IMPACT • DMI• 18
Cluster Analysis of across-wafer Shapes (using PCA basis functions)
IMPACT • DMI• 19
Integrated Circuit Variability Modeling and Statistical/Spatial Parameter Extraction
Conventional modeling uses selected die to build corner models–
Unable to capture spatial variability accurately –
Extremes of key electrical performance are captured well, e.g. ION
, VTH
–
Attribution of physical underlying variability is arbitrary–
Forces overly pessimistic design assumptions
Kun Qian(with C. Spanos)
SF FF
FSSS
TT
Fast
Fast
Slow
IMPACT • DMI• 20
Spatial Variation-Aware Model Extraction
Extracted parameters will have both deterministic and random components, as well as spatial hierarchy–
Eliminate assumption of Normal dist.
Simultaneous extraction of statistical moments and spatial characteristics over samples within die chip and across wafer–
Captures the real variation at all levels
Model accuracy will be verified on 45nm and/or 22nm test chips (transistors, logic and memory devices)
Measurement:Sample #1…NMeasurement:Sample #1…N
Extraction:Parameter Set #1…N
Extraction:Parameter Set #1…N
Inspect Spatial Structure:Systematic & Random
Inspect Spatial Structure:Systematic & Random
Spatially bounded Parameters:range(LEFF
)within-die < σrange(LEFF
)die-to-die < k*Ldrawn
Spatially bounded Parameters:range(LEFF
)within-die < σrange(LEFF
)die-to-die < k*Ldrawn
Multi-Transistor Extraction:Spatial Parameter Model
Pi
=fi (xi
,yi
) + N(0,σi ),
i=1…m
Multi-Transistor Extraction:Spatial Parameter Model
Pi
=fi (xi
,yi
) + N(0,σi ),
i=1…m
IMPACT • DMI• 21
Optimal Sampling for Spatial & Layout Variability
45 and 22nm test patterns to be designed in collaboration with BWRC for fabrication by ST Microelectronics
IMPACT • DMI• 22 11/02/2009
Pattern-based clip clustering
Justin Ghan
Faculty: Poolla, SpanosObjective: Fast algorithms for clip clusteringMotivation: Leverage on speed of PatternMatch.
Clustering of clips could be across DfM appsex: clip based DRC could be much faster thanrule based at 22nm
Payoff: Fast clustering in hotspot classification, auto-repair, clip library building, DRC, OPC re-use etc
Problem formulations: robust distance metrics to compare clips (needed for clustering)fast modified incremental-hierarchical clustering
IMPACT • DMI• 23 11/02/2009
Main Ideas and Results
Developed an optimal weighted metric
Infeasible to calculate distance between all pairs of clips
Incremental clustering algorithm much more efficient–
Based on BUBBLE clustering algorithm (Ganti et al.).
Modified incremental clustering–
Replace cluster by canonical clip–
Need only calculate distances to canonical clips
–
20X faster if there are few clusters
# of clips BUBBLE modified
500 88 s 28 s
2000 1209 s (~20 min) 96 s
13200 35787 s (~10 hr) 1625 s (~27 min)
1 2
4
12
21 2
1 2 1 2
, , d .
ˆ , , .min
w
w wD
x y A
IMPACT • DMI• 24 11/02/2009
Design-Level Assessment of DPL Technology Options
Faculty: Andrew B. KahngObjective: Design-level assessment of various DPL technology optionsMotivation: Different DPL technology options (process
methods, resist
types, critical features, etc.) have different impacts on linewidth and space Different R / C / delay variation
Goals: (1) Analysis of mechanisms of interconnect (BEOL) variation in DPL across various technology options
(2) Development of design-level analysis framework considering
additional variability in DPL, based on commercial toolsets
(3) Assessment of design-level impacts of BEOL variation (width/space)
across DPL technology options using our framework
capacitance / crosstalk delay / total negative slack variation
Kwangok Jeong
IMPACT • DMI• 25 11/02/2009
Width/Space Variation Analysis Framework
Double exposure (DE)/double patterning (DP) : –
Space OR width variation
Spacer double patterning (SDP): –
Space AND width variation
Analysis Framework
1 2 1 2 1
W’’ W’
P P
S S
1 2 1 2 1
WW’’
P P
S S
1 2 1 2 1
W’ W’
P’’ P’
Positive Photoresist DE/DP Negative Photoresist DE/DP
Positive SDP
(Space under spacer)
Negative SDP(Line under spacer)
1 2 1 2 1
W W
P’ P’’
S S
TOP.GDS
Initial GDS
Coloring and Splitting
SUB2.GDSSUB1.GDS
Shifting and Resizing
SUB1 (x1, y1)TOP.GDS
SUB2 (x2, y2)
TOP.GDS
GDSDPL-Type: DE, DP, or SDPS: Amount of variation
RC Extraction /Timing Analysis
S S
IMPACT • DMI• 26
Analysis Results
Overlay error can cause more than +/- 10% capacitance variation within a die, for all DPL options Large on-chip variation Increase of timing difficulty
Maximum crosstalk delay–
With the same 3
variation (12nm)•
P-DE/DP
may be the most favorable option for BEOL DPL
•
Overlay spec for P-DE/DP can be relaxed by 2X compared to others
–
With different variation spec, e.g., 3
for DE/DP, and 1
for SDP, •
Similar impacts Need to consider design and lithographic costs11/02/2009
Cap
acita
nce
Varia
tion
(%)
IMPACT • DMI• 27 11/02/2009
Timing-Aware Recoloring and Detailed Placement Perturbation for DPL
Faculty: Andrew B. KahngObjective: Mitigation of the impact of bimodal CD distributionMotivation: Three key facts in bimodal CD distribution
①
Design requires bimodal-aware timing models•
Unimodal representation increases design guardband②
(Our Focus) Data paths benefit from alternate (mixed) coloring•
Minimize correlated variations on gates in a given path•
Exploit existence of two uncorrelated CD populations③
Clock paths benefit from having same uniform coloring •
Correlated variation minimizes launch-capture skew
Kwangok Jeong
Different uniform coloring Different delay Large skew0.0E+00
5.0E-12
1.0E-11
1.5E-11
2.0E-11
2.5E-11
3.0E-11
1 nm 2 nm 3 nm 4 nm 5 nm 6 nmMean Difference
Del
ay (s
)
Best case: Large CD groupWorst case: Large CD groupBest case: Small CD groupWorst case: Small CD groupBest case: Pooled CDWorst case: Pooled CD
Large CD group
Small CD group
Pooled-unimodal
IMPACT • DMI• 28
Bimodality-Aware Timing Model
Bimodal-aware timing model–
Two timing libraries: •
G1L-G2S:
Group 1 has larger CD than Group 2•
G1S-G2L: Group 1 has smaller CD than Group 2–
Two coloring versions of a cell in a library•
C12: leftmost poly is in Group 1•
C21: leftmost poly is in Group 2–
CD mean difference•
Chosen from process information•
E.g., 2nm, 4nm or 6nm
Bimodal-aware timing analysis–
For each CD mean difference, check timing for each timing library G1L-G2S and G1S-G2L
–
Worse timing between the two libraries is regarded as the actual
worst-case timing
11/02/2009
IMPACT • DMI• 29
DPL Layout-to-Mask FlowRTL-to-GDS
DPL Mask Coloring
Bimodal-AwareTiming Analysis
Maximization ofAlternate Coloring
(Datapaths)
Optimization 1
Alternate coloring
using integer-linear programming
Placement Perturbationfor Color Conflict Removal
(Clock and Data paths)
Optimization 2
Coloring conflict > Minimum resolution
Placement perturbation usingdynamic programming
11/02/2009
IMPACT • DMI• 30
Results: Improvement on Timing Slack
Bimodal timing model Reduce pessimism
Alternate coloring Improve timing
Placement perturbation Remove conflicts
Stage #Conflict TimingMetric
Mean CD Difference2nm 4nm 6nm
Initial Coloring(Unimodal) 0
WNS (ns) -1.113 -2.016 -2.902TNS (ns) -671.1 -1776.3 -3348.5
Initial Coloring(Bimodal) 0
WNS (ns) -0.191 -0.354 -0.527TNS (ns) -8.17 -26.56 -64.64
AlternativeColoring 219
WNS (ns) -0.090 -0.145 -0.267TNS (ns) -1.48 -3.85 -22.40
DPL-Corr(+ECO Routing) 0
WNS (ns) -0.104 -0.183 -0.295TNS (ns) -3.43 -10.45 -28.42
The impact of bimodality can be effectively mitigated!11/02/2009
IMPACT • DMI• 31
Single Mask Double-Patterning Litho Rani S. Ghaida and Prof. Puneet Gupta
11/02/2009
Single mask DPL:– print first pattern as in standard DPL– shift photomask by min gate pitch and print 2nd pattern– non-critical trim exposure to remove unwanted features
Mask shift2nd litho/etch1st litho/etch
resist 1st hardmask 2nd hardmask poly
Trim exposure Final etch
IMPACT • DMI• 32
An Example – 4-Input OAI
11/02/2009
no area overhead
FIRST EXPOSURE SHIFT‐EXPOSE TRIM
FINAL LAYOUT ORIGINAL CELLCOMPLETE POLY
• Example shown using a 2D “I”
template• 1D template (i.e., 1D poly trivial)
IMPACT • DMI• 33
Benefits and Challenges
11/02/2009
Results: negligible area overhead, simple trim-mask, and trivial layout decomposition
Cost: mask-cost cut to nearly half that of DPL
Overlay: virtually eliminated in -ve
LLE (wafer stays in tool)–
reticle/mask related overlay components are eliminated–
alignment error is reduced due identical layouts
Throughput: no alignment time in negative LLE, no mask loading/unloading and reticle
alignment → time saving
CD bimodality: mask CDU no longer affects bimodality–
E.g., σdiff
reduced from 1.49nm to 1.34nm (10.3% reduction)
Challenges:–
Layout redesign effort (but more flexible than gridded
rules)–
Different OPC for the two exposures forbidden–
Overhead of trim exposure and its associated processing steps