ASIC unit7

Embed Size (px)

Citation preview

  • 7/28/2019 ASIC unit7

    1/17

    12/12/2011

    1

    ASIC

    CONSTRUCTION

    Agenda

    Physical Design

    CAD Tools Methods and Algorithms

    System Partitioning

    Floorplanning

    Physical Design

    The physical design of ASICs is divided into system

    partitioning, floorplanning, placement, and routing.

    A microelectronic system is the town and the ASICs are the

    buildings:

    System partitioning corresponds to town planning.

    ASIC floorplanning is the architects job.

    Placement is done by the builder.

    Routing is done by the electrician.

    We shall design most, but not all, ASICs using these design

    steps.

    The steps may be performed in aslightly different order, iterated oromitted depending on the type andsize of the system and its ASICs.

    As the focus shifts from logic tointerconnect, floorplanning assumesan important role.

    Each of the steps shown in the figuremust be performed and each depends

    on previous step. However, the trend is toward

    completing these steps in a parallelfashion and iterating, rather than in asequential manner.

    First apply system partitioning to divide a micro-electronicssystem into separate ASICs.

    In floorplanning, estimate sizes and set the initial relative

    locations of the various blocks in our ASIC.

    Allocate space for clock and power wiring and decide on the

    location of the I/O and power pads.

    Placement defines the location of the logic cells within the

    flexible blocks and sets aside space for the interconnect to

    each logic cell:

    Placement for a gate-array or standard-cell design assigns each

    logic cell to a position in a row.

    For FPGA, placement chooses which fixed logic resources on

    the chip are used for which logic cells.

    Floorplanning and placement are closely related and are

    sometimes combined in a single CAD tool.

    Routing makes the connections between logic cells.

    Routing is a hard problem by itself and is normally split into

    two distinct steps, called global and local routing.

    Global routing determines where the interconnections between

    the placed logic cells and blocks will be situated:

    Only the routes to be used by the interconnections are decided

    in this step, not the actual locations of the interconnections

    within the wiring areas.

    Global routing is called loose routing for this reason.

    Local routing joins the logic cells with interconnections:

    Information on which interconnection areas to use comes from

    the global router.

    Finally decide on the width, mask layer, and exact location of

    the interconnections.

    Local routing is also known as detailed routing.

  • 7/28/2019 ASIC unit7

    2/17

    12/12/2011

    2

    CAD Tools

    To develop a CAD tool it is necessary to convert each of thephysical design steps to a problem with well-defined goals

    and objectives. The goals for each physical design step are the things we must

    achieve.

    The objectives for each step are things we would like to meeton the way to achieving the goals.

    Some examples of goals and objectives for each of the ASICphysical design steps are as follows:

    System partitioning:

    Goal: Partition a system into a number of ASICs.

    Objective: Minimize number of external connections between

    the ASICs, keep each ASIC smaller than a maximum size.

    Floorplanning:

    Goal: Calculate the sizes of all the blocks and assign them

    locations.

    Objective: Keep the highly connected blocks physically close to

    each other.

    Placement:

    Goal: Assign the interconnect areas and the location of all the

    logic cells within the flexible blocks.

    Objective: Minimize the ASIC area and the interconnect density.

    Global routing:

    Goal: Determine the location of all the interconnect.

    Objective: Minimize the total interconnect area used.

    Detailed routing:

    Goal: Completely route all the interconnect on the chip.

    Objective: Minimize the total interconnect length used.

    There is no magic recipe involved in the choice of the ASIC

    physical design steps.

    Floorplanning and placement are often thought of as one step

    and in some tools placement and routing are performed

    together.

    Methods and Algorithms

    CAD tool needs methods or algorithms to generate a solution

    to each problem using a reasonable amount of computer time.

    There is no best solution possible to a particular problem, and

    the tools must use heuristic algorithms, or rules of thumb, to

    try and find a good solution.

    To solve each of ASIC physical design steps we require:

    a set of goals and objectives, a way to measure the goals and

    objectives

    an algorithm or method to find a solution that meets the goals

    and objectives

    The term algorithm is usually reserved for a method that

    always gives a solution.

    We need to know how practical any algorithm is.

    We say the complexity of an algorithm is O(f(n)).

    The function f (n) is usually one of the following kinds:

    f (n) = constant

    f (n) = log n

    f (n) = n

    f (n) = n log n

    f (n) = n2

    As designers attempt to achieve a desired ASIC performance

    they make a continuous trade-off between speed, area, power,

    and several other factors.

    CAD tools are not smart enough to do this alone.

    Current CAD tools are only capable of finding a solution

    subject to a few, very simple, objectives.

    System Partitioning

  • 7/28/2019 ASIC unit7

    3/17

    12/12/2011

    3

    Agenda Introduction

    Measuring Connectivity

    A Simple Partitioning Example

    Partitioning Methods

    Constructive Partitioning

    Iterative Partitioning Improvement:

    The Kernighan Lin Algorithm

    The Fiduccia Mattheyses Algorithm

    The Ratio-Cut Algorithm

    The Look-ahead Algorithm

    Introduction

    Microelectronic systems typically consist of many functional

    blocks.

    If a functional block is too large to fit in one ASIC, we may

    have to split, or partition, the function into pieces using goals

    and objectives that we need to specify.

    Use CAD tools to help with this type of system partitioning.

    System partitioning requires goals and objectives, methods

    and algorithms to find solutions, and ways to evaluate these

    solutions.

    The goal of partitioning is to divide the part of the system

    so that each partition is a single ASIC.

    The objectives to be considered are:

    A maximum size for each ASIC

    A maximum number of ASICs

    A maximum number of connections for each ASIC

    A maximum number of total connections between all

    ASICs

    Measuring Connectivity

    To measure connectivity we use graph theory.

    Figure (a) shows a circuit schematic, netlist, or network

    consists of circuit modules A-F.

    Equivalent terms for a circuit module are a cell, logic cell,

    macro, or a block.

    A cell or logic cell usually refers to a small logic gate, but can

    also be a collection of other cells.

    Macro refers to gate-array cells.

    Block is usually a collection of gates or cells.

    Each logic cell has electrical connections between the

    terminals (connectors or pins).

    Figure 1 shows Networks, graphs, and partitioning:

    (a) A network containing circuit logic cells and nets.

    (b) The equivalent graph with vertexes and edges. For example: logic cell D maps to node D in the graph; net 1 maps to

    the edge (A, B) in the graph.

    Net 3 (with three connections) maps to three edges in the graph: (B, C),(B, F), and (C, F).

    (c) Partitioning a network and its graph. A network with a net cut that cuts two nets.

    (d) Network graph showing the corresponding edge cut. The net cutset in c contains two nets, but the corresponding edge cutset

    in d contains four edges.

    This means a graph is not an exact model of a network for partitioningpurposes.

    Figure 1:

  • 7/28/2019 ASIC unit7

    4/17

    12/12/2011

    4

    A graph contains vertexes (or vertices) A-F (also known as

    graph nodes or points) that are connected by edges.

    A graph vertex corresponds to a logic cell.

    An electrical connection (a net or a signal) between two logic

    cells corresponds to a graph edge.

    Figure (c) shows a network with nine logic cells A-I.

    A connection, for example between logic cells A and B in

    Figure (c), is written as net (A, B).

    Figure (d) shows a possible division, called a cutset.

    There is net cutset (for network) & an edge cutset (for graph).

    Connections between the two ASICs are external connections,

    the connections inside each ASIC are internal connections.

    The number of external connections is not modeled correctlyby the network graph.

    When we divide the network into two by drawing a lineacross connections, we make net cuts.

    The resulting set of net cuts is the net cutset.

    The number of net cuts we make corresponds to the numberof external connections between the two partitions.

    When we divide the network graph into the same partitionswe make edge cuts and we create the edge cutset.

    Nets and graph edges are not equivalent when a net has morethan two terminals.

    Number of edge cuts made when we partition a graph into twois not necessarily equal to number of net cuts in the network.

    Differences between nets and graph edges is important whenwe consider partitioning a network by partitioning its graph.

    A Simple Partitioning Example

    Figure 2 (a) shows a simple network we need to partition.

    There are 12 logic cells, labeled A-L, connected by 12 nets.

    Each logic cell is a large circuit block and might be RAM,ROM, an ALU, and so on.

    Each net might also be a bus, we assume that each net is asingle connection and all nets are weighted equally.

    The goal is to partition our simple network into ASICs.

    The objectives are: Use no more than three ASICs.

    Each ASIC is to contain no more than four logic cells.

    Use minimum number of external connections for each ASIC.

    Use minimum total number of external connections.

    Figure 2 (a): We wish to partition this network into threeASICs with no more than four logic cells per ASIC .

    Figure 2 (b) shows a partitioning with five externalconnections;

    two of the ASICs have three pins;

    the third has four pins

    A partitioning with five external connections (nets 2, 4, 5, 6,and 8) the minimum number.

  • 7/28/2019 ASIC unit7

    5/17

    12/12/2011

    5

    Figure 2 (c): A constructed partition using logic cell C as aseed. It is difficult to get from this local minimum, with sevenexternal connections (2, 3, 5, 7, 9,11,12), to the optimumsolution of (b).

    Partitioning Methods

    Two types of algorithms are used:

    Constructive partitioning

    Iterative partitioning improvement

    Constructive partitioning, which uses a set of rules to

    find a solution.

    Iterative partitioning improvement (or iterative

    partitioning refinement), which takes an existing solution

    and tries to improve it.

    Often we apply iterative improvement to a constructive

    partitioning.

    Constructive Partitioning

    The most common constructive partitioning algorithms use seedgrowth or cluster growth.

    A simple seed-growth algorithm for constructive partitioningconsists of the following steps:

    1. Start a new partition with a seed logic cell.

    2. Consider all the logic cells that are not yet in a partition.

    Select each of these logic cells in turn.3. Calculate a gain function g(m), that measures the benefit of

    adding logic cell m to the current partition.

    One measure of gain is the number of connections between

    logic cell m and the current partition.

    4. Add the logic cell with the highest gain g(m) to the

    current partition.

    5. Repeat the process from step 2. If you reach the limit

    of logic cells in a partition, start again at step 1.

    We may choose different gain functions according to theobjectives.

    The algorithm starts with the choice of a seed logic cell (seed

    module, or just seed). The logic cell with the most nets is a good choice as the seed

    logic cell.

    We can also use a set of seed logic cells known as a cluster orclique borrowed from graph theory.

    Iterative Partitioning Improvement

    Iterative improvement algorithms are based on interchange and

    group migration.

    The process ofinterchanging (swapping) logic cells in an

    effort to improve the partition is an interchange method.

    If the swap improves the partition,

    we accept the trial interchange; otherwise

    we select a new set of logic cells to swap.

    There is a limit to what we can achieve with a partitioning

    algorithm based on simple interchange.

    Figure 2 shows a partitioning of the network of part a using

    constructed partitioning algorithm with logic cell C as seed.

    To get from the solution shown in part (c) to the solution of

    part (b), which has a minimum number of external

    connections, requires a complicated swap.

    The three pairs: D and F, J and K, C and L need to be

    swapped all at the same time.

    It takes long time to consider all possible swaps of this

    complexity.

    A interchange algorithm considers only one change and

    rejects it immediately if it is not an improvement.

    Algorithms of this type are greedy algorithms in the sense that

    they will accept a move only if it provides immediate benefit.

    Such short sightedness leads an algorithm to a local minimum

    from which it cannot escape.

  • 7/28/2019 ASIC unit7

    6/17

    12/12/2011

    6

    Group migration consists of swapping groups of logic cells

    between partitions.

    Group migration algorithms are better than simple interchange

    methods at improving a solution but are more complex.

    All group migration methods are based on Kernighan Lin

    (KL) algorithm that partitions a graph.

    The problem of dividing a graph into two pieces, minimizing

    the nets that are cut, is the min-cut problem a very important

    one in VLSI design.

    The KL algorithm can be applied to many different problems

    in ASIC design.

    Examine the algorithm next and then see how to apply it to

    system partitioning.

    The Kernighan Lin Algorithm

    Consider a network with 2 m nodes (where m is an integer)

    each of equal size.

    External edges cross between partitions, internal edges are

    contained inside a partition.

    If we assign a cost to each edge of the network graph, define

    the cost matrix C = cij, where cij = cji and cii = 0.

    If all connections are equal in importance, the elements of the

    cost matrix are 1 or 0, and in this special case we usually call

    the matrix the connectivity matrix.

    Costs higher than 1 could represent the number of wires in a

    bus, multiple connections to a single logic cell, or nets that we

    need to keep close for timing reasons.

    Figure below illustrates some of the terms and definitions

    needed to describe the K L algorithm.

    (a) An example network graph

    (b) The connectivity matrix, C

    The column and rows are labelled to see how the matrix

    entries correspond to the node numbers in the graph.

    For example, C17 (column 1, row 7) equals 1 because nodes 1

    and 7 are connected.

    In this example all edges have an equal weight of 1, but in

    general the edges may have different weights.

    We already have split a network into two partitions, A and B,

    each with m nodes (using a constructed partitioning).

    The goal is to swap nodes between A and B with the objective

    of minimizing the number of external edges connecting the

    two partitions.

    Each external edge may be weighted by a cost, and the

    objective corresponds to minimizing a cost function that we

    shall call the total external cost, cut cost, or cut weight, W :

    In Figure (a) the cut weight is 4 (all the edges have weights of

    1).

    To simplify the measurement of the change in cut weight when

    we interchange nodes, we need some more definitions.

    First, for any node a in partition A, define an external edge

    cost, which measures the connections from node a to B,

    For example, in Figure (a) E1 = 1, and E3 = 0.

    Second, we define the internal edge cost to measure the

    internal connections to a,

    In Figure (a), I1 = 0, and I3 = 2.

    Define the edge costs for partition B in a similar way (so E8 =

    2, and I8 = 1).

    The cost difference is the difference between external edge

    costs and internal edge costs,

    In Figure (a) D1 = 1, D3 = 2, and D8 = 1.

    -

  • 7/28/2019 ASIC unit7

    7/17

    12/12/2011

    7

    Now pick any node in A, and any node in B.

    Swap these nodes, a & b, measure the reduction in cut

    weight, which we call the gain, g.

    Express g in terms of the edge costs as follows:

    The last term accounts for the fact that a and b may be

    connected.

    In figure (a), if we swap nodes 1 and 6, then g = 2.

    If we swap nodes 2 and 8, then g = 1.

    -

    The KL algorithm finds a group of node pairs to swap that

    increases the gain even though swapping individual node

    pairs from that group might decrease the gain.

    First pretend to swap all of the nodes a pair at a time. Pretend swaps are like studying chess games when you make

    a series of trial moves in your head.

    The algorithm is:

    1. Find two nodes, ai from A, and bi from B, so that the gain

    from swapping them is a maximum, the gain gi is

    gi = Dai + Dbi - 2 caibi

    2. Next pretend swap ai and bi even if the gain gi is zero or

    negative, and do not consider ai and bi eligible for being

    swapped again.

    3. Repeat steps 1 and 2 a total of m times until all the nodes of

    A and B have been pretend swapped. We are back where we

    started, but we have ordered pairs of nodes in A and B

    according to the gain from interchanging those pairs.

    4. Now we can choose which nodes we shall actually swap.

    Suppose we only swap the first n pairs of nodes that we

    found in the preceding process. In other words we swap

    nodes X = a1, a2, &., an from A with nodes Y = b1, b2,

    &.., bn from B. The total gain would be,

    5. Choose n corresponding to the maximum value of Gn .

    If the maximum value of Gn > 0, then swap the sets of nodes

    X and Y and thus reduce the cut weight by Gn.

    Use this new partitioning to start the process again at first step.

    If the maximum value of Gn = 0, then we cannot improve the

    current partitioning and we stop.

    We have found a locally optimum solution.

    Figure below shows an example of partitioning a graph using

    the KL algorithm.

    Each completion of steps 1 through 5 is a pass through the

    algorithm.

    Kernighan and Lin found that typically 24 passes were

    required to reach a solution.

    FIGURE: Partitioning a graph using the KL algorithm.

    (a) Shows how swapping node 1 of partition A with node 6 of

    partition B results in a gain of g = 1.

    (b) A graph of the gain resulting from swapping pairs of nodes.

    (c) The total gain is equal to the sum of the gains obtained at

    each step.

    The most important feature of the KL algorithm is that we are

    prepared to consider moves even though they seem to make

    things worse.

    The KL algorithm works well for partitioning graphs.

  • 7/28/2019 ASIC unit7

    8/17

    12/12/2011

    8

    Following problems need to be addressed before

    applying the algorithm to network partitioning:

    It minimizes the number ofedges cut, not the number of

    nets cut. It does not allow logic cells to be different sizes.

    It is expensive in computation time.

    It does not allow partitions to be unequal or find the

    optimum partition size.

    It does not allow for selected logic cells to be fixed in

    place.

    It does not directly allow for more than two partitions.

    The results are random.

    To implement a net-cut partitioning rather than an edge-cut

    partitioning, we can just keep track of the nets rather than

    the edges.

    We can no longer use a connectivity or cost matrix torepresent connections, though.

    To represent nets with multiple terminals in a network

    accurately, we can extend the definition of a network graph.

    Figure next shows how a hypergraph with a special type of

    vertex, a star, and a hyperedge, represents a net with more

    than two terminals in a network.

    FIGURE: A hypergraph.

    (a) The network contains a net y with three terminals.

    (b) In the network hypergraph we can model net y by a single

    hyperedge (B, C, D) and a star node.

    Now there is a direct correspondence between wires or nets in

    the network and hyperedges in the graph.

    Summary

    In the KL algorithm, the internal and external edge costs have

    to be calculated for all the nodes before we can select the

    nodes to be swapped.

    Then we have to find the pair of nodes that give the largest

    gain when swapped.

    This requires an amount of computer time that grows as n2logn

    for a graph with 2n nodes.

    This n2 dependency is a major problem for partitioning large

    networks.

    The Fiduccia Mattheyses Algorithm

    The FM algorithm is an extension to the KL algorithm that

    addresses the differences between nets and edges and also

    reduces the computational effort.

    The key features of this algorithm are the following:

    1. Only one logic cell, the base logic cell, moves at a time.

    To stop the algorithm from moving all the logic cells to one large

    partition, the base logic cell is chosen to maintain balance

    between partitions.

    The balance is the ratio of total logic cell size in one partition to

    the total logic cell size in the other.

    Altering the balance allows us to vary the sizes of the partitions.

    2. Critical nets are used to simplify the gain calculations:

    A net is a critical net if it has an attached logic cell that, when

    swapped, changes the number of nets cut.

    It is only necessary to recalculate the gains of logic cells on

    critical nets that are attached to the base logic cell.

    3. Logic cells that are free to move are stored in a doubly

    linked list:

    The lists are sorted according to gain.

    This allows the logic cells with maximum gain to be found

    quickly.

    These techniques reduce the computation time so that it

    increases only slightly more than linearly with the number of

    logic cells in the network, a very important improvement.

  • 7/28/2019 ASIC unit7

    9/17

    12/12/2011

    9

    Comparison between KL & FMAlgorithms

    The Ratio-Cut Algorithm

    The ratio-cut algorithm removes the restriction of constantpartition sizes.

    The cut weight W for a cut that divides a network into twopartitions, A and B, is given by,

    The KL algorithm minimizes W while keeping partitions Aand B the same size.

    The ratio of a cut is defined as

    In this equation |A| and |B| are the sizes of partitions A and B.

    The size of a partition is equal to the number of nodes it

    contains (also known as the set cardinality).

    The cut that minimizes R is called the ratio cut.

    The original description of the ratio-cut algorithm uses ratio

    cuts to partition a network into small, highly connected

    groups.

    Then you form a reduced network from these groups each

    small group of logic cells forms a node in the reduced

    network.

    Finally, you use the FM algorithm to improve the reduced

    network.

    The Look-ahead Algorithm

    Both KL and FM algorithms consider only the immediate gainto be made by moving a node.

    When there is a tie between nodes with equal gain, there is nomechanism to make the best choice.

    Figure next shows an example of two nodes that have equalgains, but moving one of the nodes will allow a move that hasa higher gain later.

    Figure illustrates an example of network partitioning thatshows the need to look ahead when selecting logic cells to bemoved between partitions.

    Partitions (a), (b), and (c) show one sequence of moves.

    Partitions (d), (e), and (f) show a second sequence.

    The partitioning in (a) can be improved by moving node 2

    from A to B with a gain of 1.

    The result of this move is shown in (b).

    This partitioning can be improved by moving node 3 to B,

    again with a gain of 1.

    The partitioning shown in (d) is the same as (a).

    Move node 5 to B with a gain of 1 as shown in (e), but now we

    can move node 4 to B with a gain of 2.

    We call the gain for the initial move the first-level gain.

    Gains from subsequent moves are then second-level and

    higher gains.

    Define a gain vector that contains these gains.

  • 7/28/2019 ASIC unit7

    10/17

    12/12/2011

    10

    Using the gain vector allows us to use a look-ahead algorithm

    in the choice of nodes to be swapped.

    This reduces both the mean and variation in the number of cuts

    in the resulting partitions.

    If we wish to divide a system into more than two pieces, this

    can be done recursively by applying the algorithms.

    For example, to divide a system network into three pieces,

    apply the FM algorithm first, using a balance of 2:1, to

    generate two partitions, with one twice as large as the other.

    Then we apply the algorithm again to the larger of the two

    partitions, with a balance of 1:1, which will give us three

    partitions of roughly the same size.

    Floorplanning

    Agenda

    Introduction

    Floor planning Goals and Objectives

    Measurement of delay in floorplanning

    Floorplanning Tools

    Channel Definition

    I/O and Power Planning

    Clock Planning

    Introduction

    Floor planning is the mapping between logical description

    (the net list) and the physical description (the Floor plan).

    Floorplanning gives early feedback: thinking of layout at early

    stages may suggest valuable architectural modifications;

    floorplanning also aids in estimating delay due to wiring.

    Floorplanning fits very in a top-down design strategy, the step-

    wise refinement strategy also propagated in software design.

    Floorplanning precedes placement.

    The netlist is a logical description of the ASIC; the floorplan is

    a physical description of an ASIC.

    The output of the placement step is a set of directions for the

    routing tools.

    Inputs to the floorplanning problem:

    A set of blocks, hard or soft.

    Pin locations of hard blocks.

    A netlist: It describing circuit blocks, the logic cells within the

    blocks, and their connections.

    Constraints

    Higher level Chip Layout

    Power

    Memory Modules, IP, Macros Placement

    IO Placement and Packaging

    Logical Grouping

    Die Size Estimation

    Core or IO limited

    Floorplanning Goals and Objectives

    The goals of floorplanning are to:

    arrange the blocks on a chip,

    decide the location of the I/O pads,

    decide the location and number of the power pads,

    decide the type of power distribution,

    decide the location and type of clock distribution.

    The objectives of floorplanning are to:

    minimize the chip area,

    minimize delay (reduce wire length for critical nets),

    maximize routability (minimize congestion).

  • 7/28/2019 ASIC unit7

    11/17

    12/12/2011

    11

    Measurement of delay in floorplanning

    In floorplanning we predict the interconnect delay before we

    complete any routing.

    Delay is dependent on resistance and capacitance.

    Parasitic associated with interconnect is not known i.e.,

    interconnect capacitance (wiring capacitance or routing

    capacitance) and interconnect resistance.

    Only fanout (FO) of a net and the size of the block is known.

    Interconnect length is determined by the predicted-

    capacitance tables (wire load tables).

    Predicted capacitance.

    (a) Interconnect lengths as a function of fanout (FO) andcircuit-block size.

    (b) Wire-load table.

    There is only one capacitance value for each fanout

    (typically the average value).

    (c) The wire-load table predicts the capacitance and delay of a

    net (with a considerable error).

    Net A and net B both have a fanout of 1, both have the

    same predicted net delay, but net B in fact has a much

    greater delay than net A in the actual layout.

    Floorplanning Tools

    Figure 1 (a) shows an initial random floorplan generated by a

    floorplanning tool.

    Two of the blocks, A and C, are standard-cell areas.

    These are flexible (variable) blocks because, although their

    total area is fixed, their shape and connector locations may be

    adjusted during the placement step.

    The dimensions and connector locations of the other fixedblocks (perhaps RAM, ROM, compiled cells, or megacells)

    can only be modified when they are created.

    Force logic cells to be in selected flexible blocks by seeding.

    Choose seed cells by name.

    Figure 1: Floorplanning a cell-based ASIC.

    (a) Initial floorplan generated by the floorplanning tool.

    Two of the blocks are flexible (A and C) and contain rows

    of standard cells (unplaced).

    A pop-up window shows the status of block A.

    (b) An estimated placement for flexible blocks A and C.

    The connector positions are known and a rats nest display

    shows the heavy congestion below block B.

    (c) Moving blocks to improve the floorplan.

    (d) The updated display shows the reduced congestion after

    the changes.

    Figure 1

  • 7/28/2019 ASIC unit7

    12/17

    12/12/2011

    12

    For eg:, ram_control* would select all logic cells whose names

    started with ram_control to be placed in one f lexible block.

    Seeding may be hard or soft:

    A hard seed is fixed and not allowed to move during the

    remaining floorplanning and placement steps.

    A soft seed is an initial suggestion only and can be altered if

    necessary by the floorplanner.

    Use seed connectors within flexible blocks forcing certain nets

    to appear in a specified order, or location at the boundary of a

    flexible block.

    The floorplanner can complete an estimated placement to

    determine the positions of connectors at the boundaries of the

    flexible blocks.

    Figure (b) illustrates a rat's nest display of the connections

    between blocks.

    Connections are shown as bundles between the centers of

    blocks or as flight lines between connectors. Figure (c) and (d) show how we can move the blocks in a

    floorplanning tool to minimize routing congestion.

    Control the aspect ratio of our floorplan to fit our chip into the

    die cavity (a fixed-size hole,) inside a package.

    Figure 2 (a) - (c) show how we can rearrange our chip to

    achieve a square aspect ratio.

    Figure (c) also shows a congestion map, another form of

    routability display.

    There is no standard measure of routability.

    FIGURE 2: Congestion analysis.

    (a) The initial floorplan with a 2:1.5 die aspect ratio.

    (b) Altering the floorplan to give a 1:1 chip aspect ratio.

    (c) A trial floorplan with a congestion map.

    Blocks A and C have been placed so that we know the terminal

    positions in the channels.

    Shading indicates the ratio of channel density to the channel

    capacity.

    Dark areas show regions that cannot be routed because the

    channel congestion exceeds the estimated capacity.

    (d) Resizing flexible blocks A and C alleviates congestion.

    Figure 2

    The interconnect (or wiring) channels, have a certain channel

    capacity, they can handle only fixed number of interconnects.

    One measure of congestion is the difference between the

    number of interconnects that we actually need, called the

    channel density, and the channel capacity.

    Another measure, shown in Figure (c), uses the ratio of

    channel density to the channel capacity.

    With practice, we can create a good initial placement by

    floorplanning and a pictorial display.

    This is one area where the human ability to recognize patterns

    and spatial relations is currently superior to a computer

    programs ability.

    Channel Definition

    During the floorplanning step we assign the areas between

    blocks that are to be used for interconnect.

    This process is known as channel definition or channel

    allocation.

    Figure 3 shows a T-shaped junction between two rectangular

    channels and illustrates why we must route the stem (vertical)

    of the T before the bar.

    The general problem of choosing the order of rectangular

    channels to route is channel ordering .

  • 7/28/2019 ASIC unit7

    13/17

    12/12/2011

    13

    Figure 3: Routing a T-junction between two channels in two-level

    metal. The dots represent logic cell pins.

    (a) Routing channel A (the stem of the T) first allows us to adjust

    the width of channel B.

    (b) If we route channel B first (the top of the T), this fixes the

    width of channel A.

    Route the stem of a T-junction before routing the top.

    Figure 4 shows a floorplan of a chip containing several blocks.

    Suppose we cut along the block boundaries slicing the chip

    into two pieces (Figure a).

    If we can slice each of these pieces into two.

    Continue in this fashion until all the blocks are separated, we

    have a slicing floorplan (Figure b).

    Figure (c) shows how the sequence we use to slice the chip

    defines a hierarchy of the blocks.

    Reversing the slicing order ensures that we route the stems of

    all the channel T-junctions first.

    4

    Figure 5 shows a floorplan that is not a slicing structure.

    We cannot cut the chip all the way across with a knife without

    chopping a circuit block in two.

    This means we cannot route any of the channels in this

    floorplan without routing all of the other channels first.

    There is a cyclic constraint in this floorplan.

    There are two solutions to this problem:

    move the blocks until we obtain a slicing floorplan.

    allow the use ofL-shaped, rather than rectangular, channels (or

    areas with fixed connectors on all sides a switch box).

    Area-based router is used rather than a channel router to route

    L-shaped regions or switch boxes.

    Figure 5: Cyclic constraints.

    (a) A nonslicing floorplan with a cyclic constraint that prevents

    channel routing.

    (b) In this case it is difficult to find a slicing floorplan without

    increasing the chip area.

    (c) This floorplan may be sliced (with initial cuts 1 or 2) and has no

    cyclic constraints, but it is inefficient in area use and will be very

    difficult to route.

    Figure 6 (a) displays the floorplan of the ASIC.

    We can remove the cyclic constraint by moving the blocks

    again, but this increases the chip size.

    Figure (b) shows an alternative solution.

    Merge the flexible standard cell areas A and C.

    We can do this by selective flattening of the netlist.

    Flattening can reduce the routing area because routing

    between blocks is usually less efficient than routing inside

    the row-based blocks.

    Figure (b) shows the channel definition and routing order for

    our chip.

  • 7/28/2019 ASIC unit7

    14/17

    12/12/2011

    14

    Figure 6: Channel definition and ordering.

    (a) Cyclic constraint is eliminated by merging the blocks A and C.

    (b) A slicing structure.

    I/O and Power Planning

    A silicon chip or die is mounted on a chip carrier inside a chip

    package.

    Connections are made by bonding the chip pads to fingers on ametal lead frame that is part of the package.

    Metal lead-frame fingers connect to the package pins.

    A die consists of a logic core inside a pad ring.

    Figure 7 (a) shows a pad-limited die and Figure (b) shows a

    core-limited die.

    On a pad-limited die use tall, thin pad-limited pads, which

    maximize the number of pads we can fit around the outside of

    the chip.

    On a core-limited die use short, wide core-limited pad.

    Figure 7: Pad-limited and core-limited die.(a) A pad-limited die: The number of pads determines the die size

    (b) A core-limited die: Core logic determines the die size.

    (c) Using both pad-limited pads and core-limited pads for a square die.

    Figure (c) shows how we can use both types of pad to change

    the aspect ratio of a die to be different from that of the core.

    Special power pads are used for the positive supply, or VDD,

    power buses (or power rails) and the ground or negative

    supply, VSS or GND.

    One set of VDD/VSS pads supplies one power ring that runs

    around the pad ring and supplies power to the I/O pads only.

    Other set of VDD/VSS pads connects to a second power ring

    that supplies the logic core.

    I/O power is called dirty power since it has to supply large

    transient currents to the output transistors.

    Keep dirty power separate to avoid injecting noise into the

    internal-logic power (clean power).

    I/O pads also contain special circuits to protect against ESD.

    These circuits can withstand very short high-voltage (several

    kilovolt) pulses that can be generated during human or

    machine handling.

    Depending on the package design, the type and positioning of

    down bonds may be fixed.

    This means we need to fix the position of the chip pad for

    down bonding using a pad seed .

    If we make an electrical connection between the substrate and

    a chip pad, or to a package pin, it must be to VDD (n -type

    substrate) or VSS (p -type substrate).

    This substrate connection (for the whole chip) employs a down

    bond (or drop bond) to the carrier.

    We have several options:

    Dedicate one (or more) chip pad(s) to down bond to the chipcarrier.

    Make a connection from a chip pad to the lead frame and downbond from the chip pad to the chip carrier.

    Make a connection from a chip pad to the lead frame and downbond from the lead frame.

    Down bond from the lead frame without using a chip pad.

    Leave the substrate and/or chip carrier unconnected.

    A double bond connects two pads to one chip-carrier fingerand one package pin.

    Do this to save package pins or reduce the series inductance ofbond wires by parallel connection of the pads.

  • 7/28/2019 ASIC unit7

    15/17

    12/12/2011

    15

    A multiple-signal pad or pad group is a set of pads.

    For example, an oscillator pad usually comprises a set of two

    adjacent pads that we connect to an external crystal.

    The oscillator circuit and the two signal pads form a singlelogic cell.

    Another common example is a clock pad.

    Some foundries allow a special form of corner pad (normal

    pads are edge pads) that squeezes two pads into the area at the

    corners of a chip using a special two-pad corner cell, to help

    meet bond-wire angle design rules (see Figure b and c).

    To reduce the series resistive and inductive impedance of

    power supply networks, it is normal to use multiple VDD and

    VSS pads.

    The output pads can easily consume most of the power on a

    CMOS ASIC, because the load on a pad is much larger than

    typical on-chip capacitive loads.

    Depending on the technology it may be necessary to providededicated VDD and VSS pads for every few SSOs.

    Design rules set how many SSOs can be used per VDD/VSS

    pad pair.

    These dedicated VDD/VSS pads must follow groups of output

    pads as they are seeded or planned on the floorplan.

    With some chip packages this can become difficult because

    design rules limit the location of package pins that may be

    used for supplies (due to the differing series inductance of

    each pin).

    Figure 8 (a) & (b) represents the magnified views of southeast

    corner of example chip and show different types of I/O cells.

    Figure (c) shows a stagger-bond arrangement using two rows

    of I/O pads.

    In this case the design rules for bond wires (the spacing and

    the angle at which the bond wires leave the pads) become very

    important.

    Figure (d) shows an area-bump bonding arrangement (also

    known as flip-chip, solder-bump or C4). Bonding pads are located in the center of the chip, the I/O

    circuits are often located at the edges of the chip because of

    difficulties in power supply distribution and integrating I/O

    circuits together with logic in the center of the die.

    FIGURE 8: Bonding pads.

    (a) This chip uses both pad-limited and core-limited pads.

    (b) A hybrid corner pad.

    FIGURE Bonding pads.

    (c) A chip with stagger-bonded pads.

    (d) An area-bump bonded chip (or flip-chip).

    The chip is turned upside down and solder bumps connect the pads

    to the lead frame.

    In an MGA the pad spacing and I/O-cell spacing is fixed each

    pad occupies a fixed pad slot (pad site).

    The properties of the pad I/O are also fixed but, if we need to,

    we can parallel adjacent output cells to increase the drive.

    To increase flexibility further the I/O cells can use a

    separation, the I/O-cell pitch, that is smaller than the pad pitch.

    For example, three 4 mA driver cells can occupy two pad

    slots.

    Then we can use two 4 mA output cells in parallel to drive one

    pad, forming an 8 mA output pad as shown in Figure 9.

  • 7/28/2019 ASIC unit7

    16/17

    12/12/2011

    16

    This arrangement also means the I/O pad cells can be changed

    without changing the base array.

    This is useful as bonding techniques improve and the pads can

    be moved closer together.

    FIGURE 9: Gate-array I/O pads.

    (a) Cell-based ASICs may contain pad cells of different sizes and

    widths.

    (b) A corner of a gate-array base.

    (c) A gate-array base with different I/O cell and pad pitches.

    Figure 9

    Figure 10 shows two possible power distribution schemes.

    The long direction of a rectangular channel is the channel

    spine.

    Some automatic routers may require that metal lines parallel to

    a channel spine use a preferred layer (either m1, m2, or m3).

    We can have both horizontal and vertical channels, we may

    have the situation shown in figure, where we have to decide

    whether to use a preferred layer or the preferred direction for

    some channels. This may or may not be handled automatically by the routing

    software.

    Figure 10

    FIGURE 10: Power distribution.

    (a) Power distributed using m1 for VSS and m2 for VDD. This helps minimize the number of vias and layer crossings needed but

    causes problems in the routing channels.

    (b) In this floorplan m1 is run parallel to the longest side of allchannels, the channel spine. This can make automatic routing easier but may increase the number of

    vias and layer crossings.

    (c) An expanded view of part of a channel (interconnect isshown as lines). If power runs on different layers along the spine of a channel, this

    forces signals to change layers.

    (d) A closeup of VDD and VSS buses as they cross. Changing layers requires a large number of via contacts to reduce

    resistance.

    Clock Planning

    Figure 11 (a) shows a clock spine routing scheme with allclock pins driven directly from the clock driver.

    MGAs and FPGAs often use this fish bone type of clockdistribution scheme.

    Figure (b) shows a clock spine for a cell-based ASIC.

    Figure (c) shows the clock-driver cell, often part of a specialclock-pad cell.

    Figure (d) illustrates clock skew and clock latency.

    All clocked elements are driven from one net with a clockspine, skew is caused by differing interconnect lengths andloads.

    If the clock-driver delay is larger than the interconnect delays,a clock spine achieves minimum skew but with long latency.

  • 7/28/2019 ASIC unit7

    17/17

    12/12/2011

    Figure 11

    FIGURE 11: Clock distribution.

    (a) A clock spine for a gate array.

    (b) A clock spine for a cell-based ASIC.

    (c) A clock spine is usually driven from one or more clock-drivercells.

    Delay in the driver cell is a function of the number of stages andthe ratio of output to input capacitance for each stage (taper).

    (d) Clock latency and clock skew.

    We would like to minimize both latency and skew.

    Delay through a chain of CMOS gates is minimized when theratio between the input capacitance C1 and the outputcapacitance C2 is about 3 (exactly e = 2.7).

    Fastest way to drive a large load is to use a chain of bufferswith their input and output loads chosen to maintain this ratio.

    We can design a tree of clock buffers so that the taper of each

    stage is e = 2.7 by using a fanout of three at each node, as

    shown in Figure 12 (a) and (b).

    The clock tree, shown in Figure (c), uses the same number of

    stages as a clock spine, but with a lower peak current for the

    inverter buffers.

    Figure (c) illustrates that we now have another problem we

    need to balance the delays through the tree carefully to

    minimize clock skew. Balance the clock arrival times at all of the leaf nodes to

    minimize clock skew.

    Designing a clock tree that balances the rise and fall times at

    the leaf nodes has the beneficial side-effect of minimizing the

    effect of hot-electron wearout.

    (a) Minimum delay is achieved when the taper of successive stages

    is about 3.

    (b) Using a fanout of three at successive nodes.

    (c) A clock tree for a cell-based ASIC

    Figure 12

    Summary Floorplanning initializes the physical design process.

    Floorplanning is the center of ASIC design operations for all

    types of ASIC.

    There are many factors to be considered during floorplanning:

    minimizing connection length and signal delay between blocks,

    arranging fixed blocks and reshaping flexible blocks to occupy

    the minimum die area,

    organizing the interconnect areas between blocks,

    planning the power, clock, and I/O distribution.

    THANK YOU