ASIC unit7

7/28/2019 ASIC unit7

1/17

12/12/2011

1

ASIC

CONSTRUCTION

Agenda

Physical Design

CAD Tools Methods and Algorithms

System Partitioning

Floorplanning

Physical Design

The physical design of ASICs is divided into system

partitioning, floorplanning, placement, and routing.

A microelectronic system is the town and the ASICs are the

buildings:

System partitioning corresponds to town planning.

ASIC floorplanning is the architects job.

Placement is done by the builder.

Routing is done by the electrician.

We shall design most, but not all, ASICs using these design

steps.

The steps may be performed in aslightly different order, iterated oromitted depending on the type andsize of the system and its ASICs.

As the focus shifts from logic tointerconnect, floorplanning assumesan important role.

Each of the steps shown in the figuremust be performed and each depends

on previous step. However, the trend is toward

completing these steps in a parallelfashion and iterating, rather than in asequential manner.

First apply system partitioning to divide a micro-electronicssystem into separate ASICs.

In floorplanning, estimate sizes and set the initial relative

locations of the various blocks in our ASIC.

Allocate space for clock and power wiring and decide on the

location of the I/O and power pads.

Placement defines the location of the logic cells within the

flexible blocks and sets aside space for the interconnect to

each logic cell:

Placement for a gate-array or standard-cell design assigns each

logic cell to a position in a row.

For FPGA, placement chooses which fixed logic resources on

the chip are used for which logic cells.

Floorplanning and placement are closely related and are

sometimes combined in a single CAD tool.

Routing makes the connections between logic cells.

Routing is a hard problem by itself and is normally split into

two distinct steps, called global and local routing.

Global routing determines where the interconnections between

the placed logic cells and blocks will be situated:

Only the routes to be used by the interconnections are decided

in this step, not the actual locations of the interconnections

within the wiring areas.

Global routing is called loose routing for this reason.

Local routing joins the logic cells with interconnections:

Information on which interconnection areas to use comes from

the global router.

Finally decide on the width, mask layer, and exact location of

the interconnections.

Local routing is also known as detailed routing.


2/17

12/12/2011

2

CAD Tools

To develop a CAD tool it is necessary to convert each of thephysical design steps to a problem with well-defined goals

and objectives. The goals for each physical design step are the things we must

achieve.

The objectives for each step are things we would like to meeton the way to achieving the goals.

Some examples of goals and objectives for each of the ASICphysical design steps are as follows:

System partitioning:

Goal: Partition a system into a number of ASICs.

Objective: Minimize number of external connections between

the ASICs, keep each ASIC smaller than a maximum size.

Floorplanning:

Goal: Calculate the sizes of all the blocks and assign them

locations.

Objective: Keep the highly connected blocks physically close to

each other.

Placement:

Goal: Assign the interconnect areas and the location of all the

logic cells within the flexible blocks.

Objective: Minimize the ASIC area and the interconnect density.

Global routing:

Goal: Determine the location of all the interconnect.

Objective: Minimize the total interconnect area used.

Detailed routing:

Goal: Completely route all the interconnect on the chip.

Objective: Minimize the total interconnect length used.

There is no magic recipe involved in the choice of the ASIC

physical design steps.

Floorplanning and placement are often thought of as one step

and in some tools placement and routing are performed

together.

Methods and Algorithms

CAD tool needs methods or algorithms to generate a solution

to each problem using a reasonable amount of computer time.

There is no best solution possible to a particular problem, and

the tools must use heuristic algorithms, or rules of thumb, to

try and find a good solution.

To solve each of ASIC physical design steps we require:

a set of goals and objectives, a way to measure the goals and

objectives

an algorithm or method to find a solution that meets the goals

and objectives

The term algorithm is usually reserved for a method that

always gives a solution.

We need to know how practical any algorithm is.

We say the complexity of an algorithm is O(f(n)).

The function f (n) is usually one of the following kinds:

f (n) = constant

f (n) = log n

f (n) = n

f (n) = n log n

f (n) = n2

As designers attempt to achieve a desired ASIC performance

they make a continuous trade-off between speed, area, power,

and several other factors.

CAD tools are not smart enough to do this alone.

Current CAD tools are only capable of finding a solution

subject to a few, very simple, objectives.

System Partitioning


3/17

12/12/2011

3

Agenda Introduction

Measuring Connectivity

A Simple Partitioning Example

Partitioning Methods

Constructive Partitioning

Iterative Partitioning Improvement:

The Kernighan Lin Algorithm

The Fiduccia Mattheyses Algorithm

The Ratio-Cut Algorithm

The Look-ahead Algorithm

Introduction

Microelectronic systems typically consist of many functional

blocks.

If a functional block is too large to fit in one ASIC, we may

have to split, or partition, the function into pieces using goals

and objectives that we need to specify.

Use CAD tools to help with this type of system partitioning.

System partitioning requires goals and objectives, methods

and algorithms to find solutions, and ways to evaluate these

solutions.

The goal of partitioning is to divide the part of the system

so that each partition is a single ASIC.

The objectives to be considered are:

A maximum size for each ASIC

A maximum number of ASICs

A maximum number of connections for each ASIC

A maximum number of total connections between all

ASICs

Measuring Connectivity

To measure connectivity we use graph theory.

Figure (a) shows a circuit schematic, netlist, or network

consists of circuit modules A-F.

Equivalent terms for a circuit module are a cell, logic cell,

macro, or a block.

A cell or logic cell usually refers to a small logic gate, but can

also be a collection of other cells.

Macro refers to gate-array cells.

Block is usually a collection of gates or cells.

Each logic cell has electrical connections between the

terminals (connectors or pins).

Figure 1 shows Networks, graphs, and partitioning:

(a) A network containing circuit logic cells and nets.

(b) The equivalent graph with vertexes and edges. For example: logic cell D maps to node D in the graph; net 1 maps to

the edge (A, B) in the graph.

Net 3 (with three connections) maps to three edges in the graph: (B, C),(B, F), and (C, F).

(c) Partitioning a network and its graph. A network with a net cut that cuts two nets.

(d) Network graph showing the corresponding edge cut. The net cutset in c contains two nets, but the corresponding edge cutset

in d contains four edges.

This means a graph is not an exact model of a network for partitioningpurposes.

Figure 1:


4/17

12/12/2011

4

A graph contains vertexes (or vertices) A-F (also known as

graph nodes or points) that are connected by edges.

A graph vertex corresponds to a logic cell.

An electrical connection (a net or a signal) between two logic

cells corresponds to a graph edge.

Figure (c) shows a network with nine logic cells A-I.

A connection, for example between logic cells A and B in

Figure (c), is written as net (A, B).

Figure (d) shows a possible division, called a cutset.

There is net cutset (for network) & an edge cutset (for graph).

Connections between the two ASICs are external connections,

the connections inside each ASIC are internal connections.

The number of external connections is not modeled correctlyby the network graph.

When we divide the network into two by drawing a lineacross connections, we make net cuts.

The resulting set of net cuts is the net cutset.

The number of net cuts we make corresponds to the numberof external connections between the two partitions.

When we divide the network graph into the same partitionswe make edge cuts and we create the edge cutset.

Nets and graph edges are not equivalent when a net has morethan two terminals.

Number of edge cuts made when we partition a graph into twois not necessarily equal to number of net cuts in the network.

Differences between nets and graph edges is important whenwe consider partitioning a network by partitioning its graph.

A Simple Partitioning Example

Figure 2 (a) shows a simple network we need to partition.

There are 12 logic cells, labeled A-L, connected by 12 nets.

Each logic cell is a large circuit block and might be RAM,ROM, an ALU, and so on.

Each net might also be a bus, we assume that each net is asingle connection and all nets are weighted equally.

The goal is to partition our simple network into ASICs.

The objectives are: Use no more than three ASICs.

Each ASIC is to contain no more than four logic cells.

Use minimum number of external connections for each ASIC.

Use minimum total number of external connections.

Figure 2 (a): We wish to partition this network into threeASICs with no more than four logic cells per ASIC .

Figure 2 (b) shows a partitioning with five externalconnections;

two of the ASICs have three pins;

the third has four pins

A partitioning with five external connections (nets 2, 4, 5, 6,and 8) the minimum number.


5/17

12/12/2011

5

Figure 2 (c): A constructed partition using logic cell C as aseed. It is difficult to get from this local minimum, with sevenexternal connections (2, 3, 5, 7, 9,11,12), to the optimumsolution of (b).

Partitioning Methods

Two types of algorithms are used:

Constructive partitioning

Iterative partitioning improvement

Constructive partitioning, which uses a set of rules to

find a solution.

Iterative partitioning improvement (or iterative

partitioning refinement), which takes an existing solution

and tries to improve it.

Often we apply iterative improvement to a constructive

partitioning.

Constructive Partitioning

The most common constructive partitioning algorithms use seedgrowth or cluster growth.

A simple seed-growth algorithm for constructive partitioningconsists of the following steps:

1. Start a new partition with a seed logic cell.

2. Consider all the logic cells that are not yet in a partition.

Select each of these logic cells in turn.3. Calculate a gain function g(m), that measures the benefit of

adding logic cell m to the current partition.

One measure of gain is the number of connections between

logic cell m and the current partition.

4. Add the logic cell with the highest gain g(m) to the

current partition.

5. Repeat the process from step 2. If you reach the limit

of logic cells in a partition, start again at step 1.

We may choose different gain functions according to theobjectives.

The algorithm starts with the choice of a seed logic cell (seed

module, or just seed). The logic cell with the most nets is a good choice as the seed

logic cell.

We can also use a set of seed logic cells known as a cluster orclique borrowed from graph theory.

Iterative Partitioning Improvement

Iterative improvement algorithms are based on interchange and

group migration.

The process ofinterchanging (swapping) logic cells in an

effort to improve the partition is an interchange method.

If the swap improves the partition,

we accept the trial interchange; otherwise

we select a new set of logic cells to swap.

There is a limit to what we can achieve with a partitioning

algorithm based on simple interchange.

Figure 2 shows a partitioning of the network of part a using

constructed partitioning algorithm with logic cell C as seed.

To get from the solution shown in part (c) to the solution of

part (b), which has a minimum number of external

connections, requires a complicated swap.

The three pairs: D and F, J and K, C and L need to be

swapped all at the same time.

It takes long time to consider all possible swaps of this

complexity.

A interchange algorithm considers only one change and

rejects it immediately if it is not an improvement.

Algorithms of this type are greedy algorithms in the sense that

they will accept a move only if it provides immediate benefit.

Such short sightedness leads an algorithm to a local minimum

from which it cannot escape.


6/17

12/12/2011

6

Group migration consists of swapping groups of logic cells

between partitions.

Group migration algorithms are better than simple interchange

methods at improving a solution but are more complex.

All group migration methods are based on Kernighan Lin

(KL) algorithm that partitions a graph.

The problem of dividing a graph into two pieces, minimizing

the nets that are cut, is the min-cut problem a very important

one in VLSI design.

The KL algorithm can be applied to many different problems

in ASIC design.

Examine the algorithm next and then see how to apply it to

system partitioning.

The Kernighan Lin Algorithm

Consider a network with 2 m nodes (where m is an integer)

each of equal size.

External edges cross between partitions, internal edges are

contained inside a partition.

If we assign a cost to each edge of the network graph, define

the cost matrix C = cij, where cij = cji and cii = 0.

If all connections are equal in importance, the elements of the

cost matrix are 1 or 0, and in this special case we usually call

the matrix the connectivity matrix.

Costs higher than 1 could represent the number of wires in a

bus, multiple connections to a single logic cell, or nets that we

need to keep close for timing reasons.

Figure below illustrates some of the terms and definitions

needed to describe the K L algorithm.

(a) An example network graph

(b) The connectivity matrix, C

The column and rows are labelled to see how the matrix

entries correspond to the node numbers in the graph.

For example, C17 (column 1, row 7) equals 1 because nodes 1

and 7 are connected.

In this example all edges have an equal weight of 1, but in

general the edges may have different weights.

We already have split a network into two partitions, A and B,

each with m nodes (using a constructed partitioning).

The goal is to swap nodes between A and B with the objective

of minimizing the number of external edges connecting the

two partitions.

Each external edge may be weighted by a cost, and the

objective corresponds to minimizing a cost function that we

shall call the total external cost, cut cost, or cut weight, W :

In Figure (a) the cut weight is 4 (all the edges have weights of

1).

To simplify the measurement of the change in cut weight when

we interchange nodes, we need some more definitions.

First, for any node a in partition A, define an external edge

cost, which measures the connections from node a to B,

For example, in Figure (a) E1 = 1, and E3 = 0.

Second, we define the internal edge cost to measure the

internal connections to a,

In Figure (a), I1 = 0, and I3 = 2.

Define the edge costs for partition B in a similar way (so E8 =

2, and I8 = 1).

The cost difference is the difference between external edge

costs and internal edge costs,

In Figure (a) D1 = 1, D3 = 2, and D8 = 1.

-


7/17

12/12/2011

7

Now pick any node in A, and any node in B.

Swap these nodes, a & b, measure the reduction in cut

weight, which we call the gain, g.

Express g in terms of the edge costs as follows:

The last term accounts for the fact that a and b may be

connected.

In figure (a), if we swap nodes 1 and 6, then g = 2.

If we swap nodes 2 and 8, then g = 1.

-

The KL algorithm finds a group of node pairs to swap that

increases the gain even though swapping individual node

pairs from that group might decrease the gain.

First pretend to swap all of the nodes a pair at a time. Pretend swaps are like studying chess games when you make

a series of trial moves in your head.

The algorithm is:

1. Find two nodes, ai from A, and bi from B, so that the gain

from swapping them is a maximum, the gain gi is

gi = Dai + Dbi - 2 caibi

2. Next pretend swap ai and bi even if the gain gi is zero or

negative, and do not consider ai and bi eligible for being

swapped again.

3. Repeat steps 1 and 2 a total of m times until all the nodes of

A and B have been pretend swapped. We are back where we

started, but we have ordered pairs of nodes in A and B

according to the gain from interchanging those pairs.

4. Now we can choose which nodes we shall actually swap.

Suppose we only swap the first n pairs of nodes that we

found in the preceding process. In other words we swap

nodes X = a1, a2, &., an from A with nodes Y = b1, b2,

&.., bn from B. The total gain would be,

5. Choose n corresponding to the maximum value of Gn .

If the maximum value of Gn > 0, then swap the sets of nodes

X and Y and thus reduce the cut weight by Gn.

Use this new partitioning to start the process again at first step.

If the maximum value of Gn = 0, then we cannot improve the

current partitioning and we stop.

We have found a locally optimum solution.

Figure below shows an example of partitioning a graph using

the KL algorithm.

Each completion of steps 1 through 5 is a pass through the

algorithm.

Kernighan and Lin found that typically 24 passes were

required to reach a solution.

FIGURE: Partitioning a graph using the KL algorithm.

(a) Shows how swapping node 1 of partition A with node 6 of

partition B results in a gain of g = 1.

(b) A graph of the gain resulting from swapping pairs of nodes.

(c) The total gain is equal to the sum of the gains obtained at

each step.

The most important feature of the KL algorithm is that we are

prepared to consider moves even though they seem to make

things worse.

The KL algorithm works well for partitioning graphs.


8/17

12/12/2011

8

Following problems need to be addressed before

applying the algorithm to network partitioning:

It minimizes the number ofedges cut, not the number of

nets cut. It does not allow logic cells to be different sizes.

It is expensive in computation time.

It does not allow partitions to be unequal or find the

optimum partition size.

It does not allow for selected logic cells to be fixed in

place.

It does not directly allow for more than two partitions.

The results are random.

To implement a net-cut partitioning rather than an edge-cut

partitioning, we can just keep track of the nets rather than

the edges.

We can no longer use a connectivity or cost matrix torepresent connections, though.

To represent nets with multiple terminals in a network

accurately, we can extend the definition of a network graph.

Figure next shows how a hypergraph with a special type of

vertex, a star, and a hyperedge, represents a net with more

than two terminals in a network.

FIGURE: A hypergraph.

(a) The network contains a net y with three terminals.

(b) In the network hypergraph we can model net y by a single

hyperedge (B, C, D) and a star node.

Now there is a direct correspondence between wires or nets in

the network and hyperedges in the graph.

Summary

In the KL algorithm, the internal and external edge costs have

to be calculated for all the nodes before we can select the

nodes to be swapped.

Then we have to find the pair of nodes that give the largest

gain when swapped.

This requires an amount of computer time that grows as n2logn

for a graph with 2n nodes.

This n2 dependency is a major problem for partitioning large

networks.

The Fiduccia Mattheyses Algorithm

The FM algorithm is an extension to the KL algorithm that

addresses the differences between nets and edges and also

reduces the computational effort.

The key features of this algorithm are the following:

1. Only one logic cell, the base logic cell, moves at a time.

To stop the algorithm from moving all the logic cells to one large

partition, the base logic cell is chosen to maintain balance

between partitions.

The balance is the ratio of total logic cell size in one partition to

the total logic cell size in the other.

Altering the balance allows us to vary the sizes of the partitions.

2. Critical nets are used to simplify the gain calculations:

A net is a critical net if it has an attached logic cell that, when

swapped, changes the number of nets cut.

It is only necessary to recalculate the gains of logic cells on

critical nets that are attached to the base logic cell.

3. Logic cells that are free to move are stored in a doubly

linked list:

The lists are sorted according to gain.

This allows the logic cells with maximum gain to be found

quickly.

These techniques reduce the computation time so that it

increases only slightly more than linearly with the number of

logic cells in the network, a very important improvement.


9/17

12/12/2011

9

Comparison between KL & FMAlgorithms

The Ratio-Cut Algorithm

The ratio-cut algorithm removes the restriction of constantpartition sizes.

The cut weight W for a cut that divides a network into twopartitions, A and B, is given by,

The KL algorithm minimizes W while keeping partitions Aand B the same size.

The ratio of a cut is defined as

In this equation |A| and |B| are the sizes of partitions A and B.

The size of a partition is equal to the number of nodes it

contains (also known as the set cardinality).

The cut that minimizes R is called the ratio cut.

The original description of the ratio-cut algorithm uses ratio

cuts to partition a network into small, highly connected

groups.

Then you form a reduced network from these groups each

small group of logic cells forms a node in the reduced

network.

Finally, you use the FM algorithm to improve the reduced

network.

The Look-ahead Algorithm

Both KL and FM algorithms consider only the immediate gainto be made by moving a node.

When there is a tie between nodes with equal gain, there is nomechanism to make the best choice.

Figure next shows an example of two nodes that have equalgains, but moving one of the nodes will allow a move that hasa higher gain later.

Figure illustrates an example of network partitioning thatshows the need to look ahead when selecting logic cells to bemoved between partitions.

Partitions (a), (b), and (c) show one sequence of moves.

Partitions (d), (e), and (f) show a second sequence.

The partitioning in (a) can be improved by moving node 2

from A to B with a gain of 1.

The result of this move is shown in (b).

This partitioning can be improved by moving node 3 to B,

again with a gain of 1.

The partitioning shown in (d) is the same as (a).

Move node 5 to B with a gain of 1 as shown in (e), but now we

can move node 4 to B with a gain of 2.

We call the gain for the initial move the first-level gain.

Gains from subsequent moves are then second-level and

higher gains.

Define a gain vector that contains these gains.


10/17

12/12/2011

10

Using the gain vector allows us to use a look-ahead algorithm

in the choice of nodes to be swapped.

This reduces both the mean and variation in the number of cuts

in the resulting partitions.

If we wish to divide a system into more than two pieces, this

can be done recursively by applying the algorithms.

For example, to divide a system network into three pieces,

apply the FM algorithm first, using a balance of 2:1, to

generate two partitions, with one twice as large as the other.

Then we apply the algorithm again to the larger of the two

partitions, with a balance of 1:1, which will give us three

partitions of roughly the same size.

Floorplanning

Agenda

Introduction

Floor planning Goals and Objectives

Measurement of delay in floorplanning

Floorplanning Tools

Channel Definition

I/O and Power Planning

Clock Planning

Introduction

Floor planning is the mapping between logical description

(the net list) and the physical description (the Floor plan).

Floorplanning gives early feedback: thinking of layout at early

stages may suggest valuable architectural modifications;

floorplanning also aids in estimating delay due to wiring.

Floorplanning fits very in a top-down design strategy, the step-

wise refinement strategy also propagated in software design.

Floorplanning precedes placement.

The netlist is a logical description of the ASIC; the floorplan is

a physical description of an ASIC.

The output of the placement step is a set of directions for the

routing tools.

Inputs to the floorplanning problem:

A set of blocks, hard or soft.

Pin locations of hard blocks.

A netlist: It describing circuit blocks, the logic cells within the

blocks, and their connections.

Constraints

Higher level Chip Layout

Power

Memory Modules, IP, Macros Placement

IO Placement and Packaging

Logical Grouping

Die Size Estimation

Core or IO limited

Floorplanning Goals and Objectives

The goals of floorplanning are to:

arrange the blocks on a chip,

decide the location of the I/O pads,

decide the location and number of the power pads,

decide the type of power distribution,

decide the location and type of clock distribution.

The objectives of floorplanning are to:

minimize the chip area,

minimize delay (reduce wire length for critical nets),

maximize routability (minimize congestion).


11/17

12/12/2011

11

Measurement of delay in floorplanning

In floorplanning we predict the interconnect delay before we

complete any routing.

Delay is dependent on resistance and capacitance.

Parasitic associated with interconnect is not known i.e.,

interconnect capacitance (wiring capacitance or routing

capacitance) and interconnect resistance.

Only fanout (FO) of a net and the size of the block is known.

Interconnect length is determined by the predicted-

capacitance tables (wire load tables).

Predicted capacitance.

(a) Interconnect lengths as a function of fanout (FO) andcircuit-block size.

(b) Wire-load table.

There is only one capacitance value for each fanout

(typically the average value).

(c) The wire-load table predicts the capacitance and delay of a

net (with a considerable error).

Net A and net B both have a fanout of 1, both have the

same predicted net delay, but net B in fact has a much

greater delay than net A in the actual layout.

Floorplanning Tools

Figure 1 (a) shows an initial random floorplan generated by a

floorplanning tool.

Two of the blocks, A and C, are standard-cell areas.

These are flexible (variable) blocks because, although their

total area is fixed, their shape and connector locations may be

adjusted during the placement step.

The dimensions and connector locations of the other fixedblocks (perhaps RAM, ROM, compiled cells, or megacells)

can only be modified when they are created.

Force logic cells to be in selected flexible blocks by seeding.

Choose seed cells by name.

Figure 1: Floorplanning a cell-based ASIC.

(a) Initial floorplan generated by the floorplanning tool.

Two of the blocks are flexible (A and C) and contain rows

of standard cells (unplaced).

A pop-up window shows the status of block A.

(b) An estimated placement for flexible blocks A and C.

The connector positions are known and a rats nest display

shows the heavy congestion below block B.

(c) Moving blocks to improve the floorplan.

(d) The updated display shows the reduced congestion after

the changes.

Figure 1


12/17

12/12/2011

12

For eg:, ram_control* would select all logic cells whose names

started with ram_control to be placed in one f lexible block.

Seeding may be hard or soft:

A hard seed is fixed and not allowed to move during the

remaining floorplanning and placement steps.

A soft seed is an initial suggestion only and can be altered if

necessary by the floorplanner.

Use seed connectors within flexible blocks forcing certain nets

to appear in a specified order, or location at the boundary of a

flexible block.

The floorplanner can complete an estimated placement to

determine the positions of connectors at the boundaries of the

flexible blocks.

Figure (b) illustrates a rat's nest display of the connections

between blocks.

Connections are shown as bundles between the centers of

blocks or as flight lines between connectors. Figure (c) and (d) show how we can move the blocks in a

floorplanning tool to minimize routing congestion.

Control the aspect ratio of our floorplan to fit our chip into the

die cavity (a fixed-size hole,) inside a package.

Figure 2 (a) - (c) show how we can rearrange our chip to

achieve a square aspect ratio.

Figure (c) also shows a congestion map, another form of

routability display.

There is no standard measure of routability.

FIGURE 2: Congestion analysis.

(a) The initial floorplan with a 2:1.5 die aspect ratio.

(b) Altering the floorplan to give a 1:1 chip aspect ratio.

(c) A trial floorplan with a congestion map.

Blocks A and C have been placed so that we know the terminal

positions in the channels.

Shading indicates the ratio of channel density to the channel

capacity.

Dark areas show regions that cannot be routed because the

channel congestion exceeds the estimated capacity.

(d) Resizing flexible blocks A and C alleviates congestion.

Figure 2

The interconnect (or wiring) channels, have a certain channel

capacity, they can handle only fixed number of interconnects.

One measure of congestion is the difference between the

number of interconnects that we actually need, called the

channel density, and the channel capacity.

Another measure, shown in Figure (c), uses the ratio of

channel density to the channel capacity.

With practice, we can create a good initial placement by

floorplanning and a pictorial display.

This is one area where the human ability to recognize patterns

and spatial relations is currently superior to a computer

programs ability.

Channel Definition

During the floorplanning step we assign the areas between

blocks that are to be used for interconnect.

This process is known as channel definition or channel

allocation.

Figure 3 shows a T-shaped junction between two rectangular

channels and illustrates why we must route the stem (vertical)

of the T before the bar.

The general problem of choosing the order of rectangular

channels to route is channel ordering .


13/17

12/12/2011

13

Figure 3: Routing a T-junction between two channels in two-level

metal. The dots represent logic cell pins.

(a) Routing channel A (the stem of the T) first allows us to adjust

the width of channel B.

(b) If we route channel B first (the top of the T), this fixes the

width of channel A.

Route the stem of a T-junction before routing the top.

Figure 4 shows a floorplan of a chip containing several blocks.

Suppose we cut along the block boundaries slicing the chip

into two pieces (Figure a).

If we can slice each of these pieces into two.

Continue in this fashion until all the blocks are separated, we

have a slicing floorplan (Figure b).

Figure (c) shows how the sequence we use to slice the chip

defines a hierarchy of the blocks.

Reversing the slicing order ensures that we route the stems of

all the channel T-junctions first.

4

Figure 5 shows a floorplan that is not a slicing structure.

We cannot cut the chip all the way across with a knife without

chopping a circuit block in two.

This means we cannot route any of the channels in this

floorplan without routing all of the other channels first.

There is a cyclic constraint in this floorplan.

There are two solutions to this problem:

move the blocks until we obtain a slicing floorplan.

allow the use ofL-shaped, rather than rectangular, channels (or

areas with fixed connectors on all sides a switch box).

Area-based router is used rather than a channel router to route

L-shaped regions or switch boxes.

Figure 5: Cyclic constraints.

(a) A nonslicing floorplan with a cyclic constraint that prevents

channel routing.

(b) In this case it is difficult to find a slicing floorplan without

increasing the chip area.

(c) This floorplan may be sliced (with initial cuts 1 or 2) and has no

cyclic constraints, but it is inefficient in area use and will be very

difficult to route.

Figure 6 (a) displays the floorplan of the ASIC.

We can remove the cyclic constraint by moving the blocks

again, but this increases the chip size.

Figure (b) shows an alternative solution.

Merge the flexible standard cell areas A and C.

We can do this by selective flattening of the netlist.

Flattening can reduce the routing area because routing

between blocks is usually less efficient than routing inside

the row-based blocks.

Figure (b) shows the channel definition and routing order for

our chip.


14/17

12/12/2011

14

Figure 6: Channel definition and ordering.

(a) Cyclic constraint is eliminated by merging the blocks A and C.

(b) A slicing structure.

I/O and Power Planning

A silicon chip or die is mounted on a chip carrier inside a chip

package.

Connections are made by bonding the chip pads to fingers on ametal lead frame that is part of the package.

Metal lead-frame fingers connect to the package pins.

A die consists of a logic core inside a pad ring.

Figure 7 (a) shows a pad-limited die and Figure (b) shows a

core-limited die.

On a pad-limited die use tall, thin pad-limited pads, which

maximize the number of pads we can fit around the outside of

the chip.

On a core-limited die use short, wide core-limited pad.

Figure 7: Pad-limited and core-limited die.(a) A pad-limited die: The number of pads determines the die size

(b) A core-limited die: Core logic determines the die size.

(c) Using both pad-limited pads and core-limited pads for a square die.

Figure (c) shows how we can use both types of pad to change

the aspect ratio of a die to be different from that of the core.

Special power pads are used for the positive supply, or VDD,

power buses (or power rails) and the ground or negative

supply, VSS or GND.

One set of VDD/VSS pads supplies one power ring that runs

around the pad ring and supplies power to the I/O pads only.

Other set of VDD/VSS pads connects to a second power ring

that supplies the logic core.

I/O power is called dirty power since it has to supply large

transient currents to the output transistors.

Keep dirty power separate to avoid injecting noise into the

internal-logic power (clean power).

I/O pads also contain special circuits to protect against ESD.

These circuits can withstand very short high-voltage (several

kilovolt) pulses that can be generated during human or

machine handling.

Depending on the package design, the type and positioning of

down bonds may be fixed.

This means we need to fix the position of the chip pad for

down bonding using a pad seed .

If we make an electrical connection between the substrate and

a chip pad, or to a package pin, it must be to VDD (n -type

substrate) or VSS (p -type substrate).

This substrate connection (for the whole chip) employs a down

bond (or drop bond) to the carrier.

We have several options:

Dedicate one (or more) chip pad(s) to down bond to the chipcarrier.

Make a connection from a chip pad to the lead frame and downbond from the chip pad to the chip carrier.

Make a connection from a chip pad to the lead frame and downbond from the lead frame.

Down bond from the lead frame without using a chip pad.

Leave the substrate and/or chip carrier unconnected.

A double bond connects two pads to one chip-carrier fingerand one package pin.

Do this to save package pins or reduce the series inductance ofbond wires by parallel connection of the pads.


15/17

12/12/2011

15

A multiple-signal pad or pad group is a set of pads.

For example, an oscillator pad usually comprises a set of two

adjacent pads that we connect to an external crystal.

The oscillator circuit and the two signal pads form a singlelogic cell.

Another common example is a clock pad.

Some foundries allow a special form of corner pad (normal

pads are edge pads) that squeezes two pads into the area at the

corners of a chip using a special two-pad corner cell, to help

meet bond-wire angle design rules (see Figure b and c).

To reduce the series resistive and inductive impedance of

power supply networks, it is normal to use multiple VDD and

VSS pads.

The output pads can easily consume most of the power on a

CMOS ASIC, because the load on a pad is much larger than

typical on-chip capacitive loads.

Depending on the technology it may be necessary to providededicated VDD and VSS pads for every few SSOs.

Design rules set how many SSOs can be used per VDD/VSS

pad pair.

These dedicated VDD/VSS pads must follow groups of output

pads as they are seeded or planned on the floorplan.

With some chip packages this can become difficult because

design rules limit the location of package pins that may be

used for supplies (due to the differing series inductance of

each pin).

Figure 8 (a) & (b) represents the magnified views of southeast

corner of example chip and show different types of I/O cells.

Figure (c) shows a stagger-bond arrangement using two rows

of I/O pads.

In this case the design rules for bond wires (the spacing and

the angle at which the bond wires leave the pads) become very

important.

Figure (d) shows an area-bump bonding arrangement (also

known as flip-chip, solder-bump or C4). Bonding pads are located in the center of the chip, the I/O

circuits are often located at the edges of the chip because of

difficulties in power supply distribution and integrating I/O

circuits together with logic in the center of the die.

FIGURE 8: Bonding pads.

(a) This chip uses both pad-limited and core-limited pads.

(b) A hybrid corner pad.

FIGURE Bonding pads.

(c) A chip with stagger-bonded pads.

(d) An area-bump bonded chip (or flip-chip).

The chip is turned upside down and solder bumps connect the pads

to the lead frame.

In an MGA the pad spacing and I/O-cell spacing is fixed each

pad occupies a fixed pad slot (pad site).

The properties of the pad I/O are also fixed but, if we need to,

we can parallel adjacent output cells to increase the drive.

To increase flexibility further the I/O cells can use a

separation, the I/O-cell pitch, that is smaller than the pad pitch.

For example, three 4 mA driver cells can occupy two pad

slots.

Then we can use two 4 mA output cells in parallel to drive one

pad, forming an 8 mA output pad as shown in Figure 9.


16/17

12/12/2011

16

This arrangement also means the I/O pad cells can be changed

without changing the base array.

This is useful as bonding techniques improve and the pads can

be moved closer together.

FIGURE 9: Gate-array I/O pads.

(a) Cell-based ASICs may contain pad cells of different sizes and

widths.

(b) A corner of a gate-array base.

(c) A gate-array base with different I/O cell and pad pitches.

Figure 9

Figure 10 shows two possible power distribution schemes.

The long direction of a rectangular channel is the channel

spine.

Some automatic routers may require that metal lines parallel to

a channel spine use a preferred layer (either m1, m2, or m3).

We can have both horizontal and vertical channels, we may

have the situation shown in figure, where we have to decide

whether to use a preferred layer or the preferred direction for

some channels. This may or may not be handled automatically by the routing

software.

Figure 10

FIGURE 10: Power distribution.

(a) Power distributed using m1 for VSS and m2 for VDD. This helps minimize the number of vias and layer crossings needed but

causes problems in the routing channels.

(b) In this floorplan m1 is run parallel to the longest side of allchannels, the channel spine. This can make automatic routing easier but may increase the number of

vias and layer crossings.

(c) An expanded view of part of a channel (interconnect isshown as lines). If power runs on different layers along the spine of a channel, this

forces signals to change layers.

(d) A closeup of VDD and VSS buses as they cross. Changing layers requires a large number of via contacts to reduce

resistance.

Clock Planning

Figure 11 (a) shows a clock spine routing scheme with allclock pins driven directly from the clock driver.

MGAs and FPGAs often use this fish bone type of clockdistribution scheme.

Figure (b) shows a clock spine for a cell-based ASIC.

Figure (c) shows the clock-driver cell, often part of a specialclock-pad cell.

Figure (d) illustrates clock skew and clock latency.

All clocked elements are driven from one net with a clockspine, skew is caused by differing interconnect lengths andloads.

If the clock-driver delay is larger than the interconnect delays,a clock spine achieves minimum skew but with long latency.


17/17

12/12/2011

Figure 11

FIGURE 11: Clock distribution.

(a) A clock spine for a gate array.

(b) A clock spine for a cell-based ASIC.

(c) A clock spine is usually driven from one or more clock-drivercells.

Delay in the driver cell is a function of the number of stages andthe ratio of output to input capacitance for each stage (taper).

(d) Clock latency and clock skew.

We would like to minimize both latency and skew.

Delay through a chain of CMOS gates is minimized when theratio between the input capacitance C1 and the outputcapacitance C2 is about 3 (exactly e = 2.7).

Fastest way to drive a large load is to use a chain of bufferswith their input and output loads chosen to maintain this ratio.

We can design a tree of clock buffers so that the taper of each

stage is e = 2.7 by using a fanout of three at each node, as

shown in Figure 12 (a) and (b).

The clock tree, shown in Figure (c), uses the same number of

stages as a clock spine, but with a lower peak current for the

inverter buffers.

Figure (c) illustrates that we now have another problem we

need to balance the delays through the tree carefully to

minimize clock skew. Balance the clock arrival times at all of the leaf nodes to

minimize clock skew.

Designing a clock tree that balances the rise and fall times at

the leaf nodes has the beneficial side-effect of minimizing the

effect of hot-electron wearout.

(a) Minimum delay is achieved when the taper of successive stages

is about 3.

(b) Using a fanout of three at successive nodes.

(c) A clock tree for a cell-based ASIC

Figure 12

Summary Floorplanning initializes the physical design process.

Floorplanning is the center of ASIC design operations for all

types of ASIC.

There are many factors to be considered during floorplanning:

minimizing connection length and signal delay between blocks,

arranging fixed blocks and reshaping flexible blocks to occupy

the minimum die area,

organizing the interconnect areas between blocks,

planning the power, clock, and I/O distribution.

THANK YOU

Documents

ASIC unit7