Upload
jessie-gordon
View
220
Download
1
Embed Size (px)
Citation preview
1
Deep Submicron Logic / Layout Synthesis
1999. 11 Jun Dong Cho
Sungkyunkwan Univ. Dept. ECEMail : [email protected]
Homepage : vada.skku.ac.kr
2
Agenda Design Methodology
Recent Approaches in Logic / Layout Synthesis
EDA Vendor & Their Tools
Conclusion
3
Design Methodology
Introduction : DSM Design Dilemma
Current Design Methodology
Recent Approaches in Design Methodology Floorplan Approach Super Glue Approach Fixed Timing Approach Simultaneous Optimization Approach
4
Introduction : DSM Design Dilemma
As physical feature sizes decrease, the time delay of electrical signals traveling in the interconnect between active devices and gates is approaching the delay through the devices and gates. Therefore, the parasitic information (resistance and capacitance) of the interconnect is absolutely critical to predicting circuit performance.
The key to solving this problem is knowing more about the physical design, i.e. placement and estimated interconnect, early in the design cycle.
Iterations between synthesis and layout increase dramatically due to timing and routability problems.
Most current VLSI tools could not handle the new problems, such as accurate RC extraction, transmission line effect and coupling effect, raised by deep submicron technology properly. Even models for VLSI ASIC designs, such as timing delays, routability, size and power dissipation will need to be modified or to be improved.
5
Introduction : DSM Design Dilemma
Although there is no official line for what constitutes a deep submicron, the term generally refers to a CMOS device whose minimum logic gate length is 0.5 um or smaller. Deep submicron technology gives the chip manufacturers' ability to put more gates in chips and increase the density of chips. These make chips more powerful and smaller.
Most current VLSI tools could not handle the new problems, such as accurate RC extraction, transmission line effect and coupling effect, raised by deep submicron technology properly. Even models for VLSI ASIC designs, such as timing delays, routability, size and power dissipation will need to be modified or to be improved.
In non-submicron integrated circuits that do not require high clock operation speed, minimum-width line can be used for clock distribution. Since the difference of logic gate delays in the signal paths dominates the clock skew, wire length does not affect the clock skew much. The interconnect wire delay is not a big issue. Under these conditions, the rule of thumb is to use the same number of identical buffers for each signal path, such that every component will experience the same logic gate delay.
6
Current Design MethodologyBehavior Level Design
Logic Synthesis
Logic Designand Simulation
Logic PartitioningDie Planning
Simulation Floorplanning
Design Verifivcation Timing Verification
Test Generation
I/O Pad Placement
Power/GroundStripes, Rings Routing
Global Placement
Detailed Placement
Clock Tree Synthesisand Routing
Global Routing
Detailed Routing
Extraction andDelay Calculator
Timing Verification
LVSDRCERC
Front End
7
Floorplan Approach Exsiting EDA Vendor
Particularly emphasize the floorplan Iterations between different tools
Traditional Floorplan No flexibility to fix timing problems caus
ed by long wires Overly constrained timing budgets May fail at timing closure Adds many buffers and oversizes gates
on critical nets.
Synthesis
System
Floorplan
Block#1
P&R
Block#N
P&R
Block#2
P&R
...
TimingVerification
8
Super Glue Approach Attempts to glue Front End and Back
End Performs floorplan, placement, routin
g, timing verification in advance
Optimize few variables Merely moving the closure issue Little correlation with final Back End Difficult feedback to Front End
Synthesis
System
FloorplanEarly PlaceEarly Route
Timing
TimingVerification
RoutingPlacement
9
Fixed Timing Approach Attempts to break the problem Early aggressive timing
optimization Based on simple conservative
models Timing is set, back-end left for
later Sub-Optimal area and power
results Trades one problem to another Difficult to extend to other
optimizations
System
SynthesisLogic Optimize
Plan Wires Timing
TimingVerification
RoutingPlacement
10
Simultaneous Layout Optimization Approach
Simultaneous Placement and Global Optimization
Placement Routing Timing Logic Optimization Clocks Power Crosstalk Additional Effects
System
Straight to Silicon
RTL FloorPlan
SynthesisT
imin
g
Logic
Opt.
Route
Pla
ce
11
Recent Approaches in Logic / Layout Synthesis
Layout Driven Logic Synthesis Wireplanning in Logic Synthesis[ICCAD`98]
Post Routing Optimization Post-Routing Optimization with Routing Characterization[ISPD`99]
Congestion Minimization On The Behavior of Congestion Minimization During Placement[ISPD`99]
Control Logic Layout Synthesis C5M- A Control-Logic Layout Synthesis System for High-Performance Microprocessor[TCAD`98]
RTL Logic / Layout Synthesis Wave Steering in YADDs : A Novel Non-Iterative Synthesis and Layout Technique[ISPD`99]
12
Layout Driven Logic Synthesis (1)Main Feature
Adoption opposite approach to conventional logic synthesis : Logic synthesis to optimize only for interconnect delay, ignoring the effect of gate delays.
Based on the simple observation that if an output “o” depends on an input “I”, then the best way to connect “I” to “o” is through a path which is monotonic from “I” to “o” : no diversions in the path from “I” to “o”.
Conventional logic synthesis can produce a circuit for which it is impossible to find a placement with no diversions in the input-output paths.
“Illegal node” : a node is illegal if it can not be placed somewhere on the die without causing a diversion in the circuit.
The proposed approach has the advantage that it still maintains a distinction between the logic synthesis and place & route stages. It does not need to tightly couple synthesis and placement by frequently alternating between the two which can be inefficient and may not converge at all.
13
Layout Driven Logic Synthesis (2)Problem Description
Assumption The die is represented by a rectangle with width and height . The given logic circuit is pin-assigned. The delay of a path is linear function of its length (In general, the interconnect delay de
pends on quadratically on the length of the interconnect, but, it can be made linear by buffer insertion and wire sizing).
I/O Pin-to-Pin delay model (IP-based synthesis vs slack-based synthesis)
Particularly suited for intellectual property(IP) blocks. Arrival time of the pins are not known in advance. Thus, we aggressively minimize the d
elays for all I/O paths. Objective Function : Minimize the longest path of the circuit
rwR rh
( , )i o i o i od x x y y
( , )( , )
min max i oi I o O
d
14
Layout Driven Logic Synthesis (3)
Synthesis Network Not good for placement(longest
path exist). Placement Tool places node z to
minimize the longest path from b to y1 & y2.
Decomposed Network Y2 is independent of b, therefore b
can be removed from the support set of y2
Path from c to y1 is greater than its Manhattan distance
Optimal Placement Synthesis Each node has a short Manhattan
distance Aim is to guide logic synthesis
such that it produces a circuit which is good for placement
15
Layout Driven Logic Synthesis (4)Constraint Generation
Region Placement Constraint Partition the die into rectangles along
the pin position Labels each region with functions that
can be placed in it Each region is labeled with a set of
placement constraint(Support set & transitive primary outputs).
r3: {c,d} is support of y2, and
{a,b} is support of y1. Node Placement Constraint
Label each node with a placement constraint
The node placement constraint of node n denotes the support set of n & its transitive primary outputs
Can be easily computed by traversing the Boolean network in BFS manner.
16
Layout Driven Logic Synthesis (5)Legalize & Synthesis
Make Legal Legal : the node n is legal if there is a region r where n can be placed. Minimize the number of new Bloolean nodes created. Traverse the Boolean network in a reverse topological order (after fanout visited, node i
s visited) Sees an illegal node. Collapse the node into its fanouts until the node becomes legal.
Constraint-Driven Synthesis Optimize the network such that we get a minimum literal legal Boolean network. Fast Extract : finds for a two-cube divisor or a two-literal cube that reduces the most nu
mber of literals in every iteration. Resubstitution : a node n is resubstituted into an other node x if n divides x(if the legali
ty of n is preserved)
Produce a legal Boolean network, which can be placed s.t. every path is monotonic.
17
Post Routing Optimization (1)Main Feature [Avante’99]
The delay due to parasitic of wire routing becomes non-ignoring factor under 0.25um
Traditional back-annotation approach can’t solve the timing problem because of inaccurate delay estimation
Many iteration occurs b/w synthesis and layout Timing convergence is not guaranteed Timing Fidelity Before and After Routing
Minimal rectilinear spanning tree or Steiner tree is usually used to estimate the wire load and delay for the interconnects of a placement
VDSM Design Routing congestion is more severe so, Wires have to detour Coupling effect is usually large Timing discrepancies b/w pre- and post- routing
Main cause of Timing discrepancy Coupling effect Routing pattern
18
Post Routing Optimization (2) Coupling Effect & Routing Pattern Effect
Coupling Effect Increase signal delay Introduces noise over neighboring
wires Dominates the wire load
Assumes coupling cap. exists b/w neighboring parallel wires only
length
distanceC
Routing Pattern Effect MST,MRST : lower bound of total wire
length The more routing congestion, the
larger detouring nets, the larger the timing discrepancy
19
Post Routing Optimization (3) Routing Characterization
Coupling characterization Divide layout floorplan into 3D routing plan of small regions Routing congestion
( )( )
( )
N rr
T r
( ) : no. nets in region
( ) : routing capacity(no. tracks)
N r r
T r
( )( )
d rr
( )( )
( )
L LC r
d rr
( ) ( )lC r r
: spce b/w two tracks
( ) : Unit-length expected coupling capacitance b/w regionlC r
20
Post Routing Optimization (4) Routing Pattern Prediction
A routing pattern prediction is required to predict which regions the final routing will go through.
This prediction can be passed down to a detailed router to guide the final routing.
Routing Pattern Problem(RPP) Given a routing graph Find set S of connected regions which
cover all terminals of this net with objective to
Subject to capacity constraints
( , )G V E
( ) ( )lr S
Minimize l r C r
( ) ( ) for each region mr r r ( ) : expected lenght within region
( ) : maximum congestion allowed within region m
l r r
r r
21
Post Routing Optimization (5)Main Feature
Cluster Selection Seed selection : choose gates in the range o
f critical slack. Selection criteria :1) criticality of the gate, 2)
difference among the arrival times of the inputs to the gates, 3) number of fan-in’s and fan-out’s, 4) congestion of the neighboring area.
Grouping : cluster the adjacent instances to the seed selection to form a partition
User-specified window size is given to control the logic change within a localized area.
Incremental placement & logic optimization is performed.
22
Post Routing Optimization (6) Routing Characterization
Routing Characterization
/ Prediction When a cluster is transformed and pla
ced, the routing of changed nets will be predicted based on the characterization
Timing analyzer may use the coupling capacitance to estimate the timing after routing
If the change improves the timing, it will be committed, the routing tree will be updated.
23
Congestion Minimization (1)Main Feature
Automated cell placement for VLSI circuits has always been a key factor
for achieving designs with optimized area usage, wiring congestion and
timing behavior. As technology advances, the congestion problem
becomes more important.
Congestion in a layout means too many nets are routed in local regions.
With the advent of over-the-cell routing the goal of every place and route
methodology has been to utilize area to prevent spilling of routes into
channels.
Multiple routing layers have enough routing resources to route most wires
as long as there are not too many wires congested in the same region.
Excessive congestion will result in a local shortage of the routing
resources.
24
Congestion Minimization (2)Definition of Congestion Cost
Global bin(bin) : partition a given chip into several retilinear regions.
Routing demand( ) : number of the nets crossing edge
Routing supply of a global edges( ) : function of the length of e(fixed value)
Overflow ( ) : exceeding amount of routing demand
Measure of congestion Total overflow of a placement Number of congested edges
ed
es
e ed s
25
Congestion Minimization (3)Consistent Model
Congestion cost is router dependent. Congestion is dependent of the wiring cost. “Consistent”
Two routing models are defined to be “consistent” if the total weighted length of the routes are the same.
A model for net consist of a set of segment . Each segment has a length and a weight . The total weighted length for net is
Total weighted length for all net is
S is S
il iw
i
i is S
L w l
i
i is S
L w l
26
Congestion Minimization (4)Congestion Distribution
Correlation between Wirelength and Congestion Total wirelength of a layout is equal to the total routing demand on all global edges
Maximum routing demand is greater than or equal to the total wirelength divided by the number of global edges.
: Average routing demand
ee
l d
: estimated length for net
: routing demand for global bin edge e
l
d e
max( )#e
ld
edge
27
Congestion Minimization (5)Theoretical Congestion Distribution
Expected number of wires crossing global edge e
( ) ( )i e ii
p x g l
( ) if 0
( ) if
( ) if
ei e e i
i
ii e i e i
i
ei e i e i
i
xp x x l
W l
lp x l x W l
W l
W xp x W l x W
W l
Theoretical analysis
28
Congestion Minimization (6)Objective Function
An effective congestion objective should be sensitive to placement moves and directly related to the congestion cost
Objective function Suppose the routing demand of e is before a move and after the
move. Direct overflow cost of this move
Cost = 0 when Cost is close to when Don’t care(no congestion) when
Overflow with Look-ahead
ed ed
max( , ) max( , )e e e ed s d s
, e e e ed s d s
es ,e e e es d d s ,e e ed d s
max( , ) min( , )e e e ed s d s
: adjustable parameter
29
Control Logic Layout Synthesis (1)Main Feature
High-performance control logic is sometimes implemented via custom (manual) layout. Custom layout methods result in good eletrical and area characteristics.
Productivity is very poor. Using custom design for control logic is often a high-risk strategy because the reaction
time to changes is long. Standard-cell and other fixed-library ASIC-like methods are often employed for co
ntrol logic Design turn around time using these methods is very fast and top-down constraints are
accommodated well. Overhead required to create the fixed cell library is substantial. A poor timing/area/power tradeoff can occur.
C5M : a new layout system for high-performance control logic which has been successfully used in the design of 400MHz IBM processor.
30
Control Logic Layout Synthesis (2)C5M Approach
C5M generates hierarchical row-based macros for static CMOS logic. Schematic independence and device-sizing tuning are accomplished
via on-the-fly leaf-cell synthesis Flow
The macro HDL description is compiled into a gate-level schematic via logic synthesis. The synthesis target library consists of parameterized gate schematics and delay rules(no layout data)
Performance is optimized through manual or automatic device-size tuning The tuned schematic is restructured for cell generation through gate
combining and splitting The leaf cells are synthesized to a macro-specific cell image The macro is assembled according to macro image
31
Control Logic Layout Synthesis (3)Leaf Cell Generation
The leaf-cell schematic is converted into a symbolic layout using CCC(IBM cell compiler)
CCC operates by first splitting the devices in the schematic according to the maximum finger size, using selectable split strategies.
Placement engine accommodates multiple objectives like minimum diffusion breaks, maximum gate alignment, minimum wire length, minimum number of contacts etc.
The symbolic layout is converted into a physical layout using CC(IBM layout compactor)
Uses the constraint-based, 1D model. Constraint-graph generation Critical-path analysis Wire-minimization
Cell-image : the result of C5. It assures the cells can be readily assembled, cell boundary are regularized to enable cell abutment, cell wiring is controlled to facilitate macro wiring.
32
Control Logic Layout Synthesis (4)Macro Assembly
C5M uses a row-based macro assembly style Placement is performed by an IBM ChipPlace:Qplace(Quadratic programming mod
el) Not restricted to row-based models. Timing driven placement support A number of functions for controlling cell placement through constraints or ojectives
Signal Wiring is created using an IBM LGWire(maze router) Macro Image
Controls the top level physical design Specifies pin assignments, bussing structure, macro shape, macro wiring porosity, row
structure and configuration of special sub-macros Size and pin data are automatically imported from the floorplan and is constructed by a
n automatic uitility that is parameterized with respect to the mask levels Bussing structure is a grid Power/Ground(M1), Vertical Wire(M2), Horizontal Wire(M3)
33
PTL Logic / Layout Synthesis (1)Introduction
YADD based layout technique Linearized, pseudo-symmetric binary decision diagram based synthesis of a function Can be directly mapped to pass transistor logic with very highly predictable delay and ar
ea Based on low granularity 2-phase pipelining Advantage
Routing by abutment Avoiding interconnect related parasitics Delay : Cell delay Equalize the delays of the different paths to very small margins of spread Be able to “Wave Steer” the circuits
The obvious limitation The size of layout can be more than the standard cell implementation’s In some cases the latency of our implementation can be more than that of the standard
cell one though the clocking frequency can still be high because of the coexistence of multiple data waves
Will not be good for feedback systems Will be good for data path circuits
34
PTL Logic / Layout Synthesis (2) Topology of Synthesized YADD Structure (1)
LBDD Defines as an Ordered BDD which grows linearly in the
number of nodes per level C2 and C3 in the 3rd level can be merged Not every function can be represented by LBDD
35
PTL Logic / Layout Synthesis (3) Topology of Synthesized YADD Structure (2)
PSBDD(Pseudo-Symmetric BDD)
Allows for multiple levels labeled with the same variable
Created by repeated application of Shannon’s expansion
Merging adjacent non-conflict nodes and/or join operation
Has a regular structure and can be directly mapped to layout
36
PTL Logic / Layout Synthesis (4) Topology of Synthesized YADD Structure (3)
YADD(Yet Another Decision Diagram) Generalization of the PSBDD and LBDD Unrestricted ordering of child nodes of a parent Two adjacent nodes in a level that can be merged Any leaf node must be present only at the lowest level of the
structure Exterior don’t cares : Process of joining cofactors and repeating
variables creates don’t cares which can be useful in the subsequent level
37
PTL Logic / Layout Synthesis (5) Topology of Synthesized YADD Structure (4)
Exterior don’t care Two adjacent nodes in a level are
in conflict and any reordering of the parent nodes cannot merge them : not solvable
Assign some care values to exterior don’t care : merging is possible
Interior don’t care When both the parents is same More powerful than exterior don’t
care
38
PTL Logic / Layout Synthesis (6) Topology of Synthesized YADD Structure (5)
Algorithm for generating YADDs Goal : generate YADD from a logic
specification Cost function : min. the number of
level of YADD Input : blif / Output : YADD Variable selected : max. the numb
er of don’t care minterm pairs after the merging
During any joining ops, the algorithm tries to create more interior don’t cares
39
PTL Logic / Layout Synthesis (7) VLSI Realization of the YADD
Implementation Regular two-dimensional structure of the YADD : entire
structure can be mapped directly to silicon by the simple expedient of replacing every node by a pass transistor logic MUX and an inverter
Why inverter? : In n-FET transistor, signal degradation of logic high signal in input occurs
Have faster rise and fall times Carry out voltage restoration and improve noise margin Size them selectively to equalize the different path delays
Requirement for PTL circuits No output should remain floating for any combination of inputs There should be no sneak paths in the circuits To make ‘safe buffer insertion’, should not keep any internal
node floating
40
PTL Logic / Layout Synthesis (8) Physical Layout Details (1)
Why use 2 phase clock scheme?
Inputs are clocked simultaneously at a higher frequency to make many waves coexist in the structure, data will be corrupted
41
PTL Logic / Layout Synthesis (9) Physical Layout Details (2)
Why use D-FF? Delay logic values by integ
er number of clock periods L YADD depth, we have (L-
1)/2 FF at the root level and 0 at lowest level
The number of FFs increases by 1 every 2 level
42
PTL Logic / Layout Synthesis (10) Physical Layout Details (3)
FF Cell Skewing of input data to provide time alignment Compact, low power, dynamic shift register cell used
Driver Convert from dynamic to static logic Subsequent inverter and static CMOS inverter pair
43
References & Suggested Readings
[1] John A.Chandy, Prithviraj Banerjee. A Parallel Circuit-Partitioned Algorithm for Timing Driven Cell Placement. Proceedings of the 1997 IEEE International Conference on Computer Design : VLSI, 1997
[2] Wilsin Gosti,Amit Narayan,Robert K.Brayton,Alberto L.Sangiovanni-Vincentelli. Wireplanning in Logic Synthesis, Proceedings of the IEEE/ACM International Conference on Compter Aided Design, 26-33, 1998
[3] Burns JL, Feldman JA. C5M - A CONTROL-LOGIC LAYOUT SYNTHESIS SYSTEM FOR HIGH-PERFORMANCE MICROPROCESSORS. IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems, V.17 N.1, 14-23, 1998.
[4] Peichen Pan. Performance-driven integration of Retiming and Resynthesis. Proceedings of the 36th ACM/IEEE conference on Design automation conference, 1999.
[5] Chieh Changfan, Yu-Chin Hsu, Fur-Shing Tsai, Post-Routing Timing Optimization with Routing Congestion. Proceedings of the 1999 international symposium on Physical design, 1999.
[6]Maogang Wang, Majid Sarrafzadeh, On the Behavior of Congestion Minimization during Placement Proceedings of the 1999 international symposium on Physical design, 1999.
[7] Kahng AB, Robins G, Singh A, Zelikovsky A. Filling algorithms and analyses for layout density control. IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems, V.18 N.4, 445-462, 1999
[8] Chang SC, Cheng KT, Woo NS, Mareksadowska M. POSTLAYOUT LOGIC RESTRUCTURING USING ALTERNATIVE WIRES. IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems, V.16 N.6, 587-59656, 1997
[9] Salek AH, Lou JN, Pedram M. An integrated logical and physical design flow for deep submicron circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems, V.18 N.9, 1305-1315, 1999
[10] Gaj K, Herr QP, Adler V, Krasniewski A, Friedman EG, Feldman MJ. Tools for the computer-aided design of multigigahertz superconducting digita circuits, IEEE Transactions on Applied Superconductivity, V.9 N.1, 18-38, 1999.