Combination Alto Processor Unit 1

Embed Size (px)

Citation preview

  • 8/8/2019 Combination Alto Processor Unit 1

    1/64

    1

    Combinational toCombinational toSequential CircuitsSequential Circuits

    to Simpleto Simple

    ProcessorsProcessors

  • 8/8/2019 Combination Alto Processor Unit 1

    2/64

    What we covered on Friday meeting?

    1. Design of SOP circuits from KMaps. Prime implicants and Covering

    2. Design of POS circuits from KMaps. Prime implicates and Covering

    3. Design of ESOP circuits from KMaps. Algebraic rules for AND/EXORlogic.

    4. Design using NAND and NOR gates. De Morgan Rules.

    5. Factorization.

    6. Multiplexers.

    7. Iterative circuits and their types.8. Using State Machines to design one-directional iterative circuits

    9. Predicates

    10. Oracles

    11. SAT oracles

    12. Graph Coloring oracles and distributed processors13. SEND+MORE=MONEY problem and its oracle.

    14. The idea of Constraint Satisfaction and Distributed Software/hardware for it.

    Ask questions to Mr Parasa and Mr Mathias Sunardi

    who actively participated.

  • 8/8/2019 Combination Alto Processor Unit 1

    3/64

    3

    Reminder Embedded SystemsReminder Embedded Systems

  • 8/8/2019 Combination Alto Processor Unit 1

    4/64

    Outline

    Introduction

    Combinational logic

    Sequential logic

    FSM design

    Custom single-purpose processor design

    RT-level custom single-purpose processor design

  • 8/8/2019 Combination Alto Processor Unit 1

    5/64

    5

    Increasing abstraction level in design specification

    Higher abstraction level focus of hardware/software design evolution

    Description smaller/easier to capture

    E.g., Line of sequential program code can translate to 1000 gates

    Many more possible implementations available

    (a) Like flashlight, the higher above the ground, the more ground illuminated

    Sequential program designs may differ in performance/transistor count by orders of magnitude

    Logic-level designs may differ by only power of 2

    (b) Design processproceeds to lower abstraction level, narrowing in on single

    implementation

    (a) (b)

    idea

    implementation

    back-of-the-envelopesequential program

    register-transfers

    logic

    modelingc

    ostincreases

    opportunitiesd

    ecrea s

    e

    idea

    implementation

  • 8/8/2019 Combination Alto Processor Unit 1

    6/64

    6

    What is Synthesis

    Automatically converting systems behavioral description to a structural

    implementation

    Complex whole formed by parts

    Structural implementation must optimize design metrics

    More expensive, complex than compilers

    Cost = $100s to $10,000s

    User controls 100s of synthesis options

    Optimization criticalOptimization critical

    Otherwise could use software

    Optimizations different for each user

    Run time = hours, days

  • 8/8/2019 Combination Alto Processor Unit 1

    7/64

    7

    Gajskis Y-chart

    Each axis represents type of description BehavioralBehavioral

    Defines outputs as function of inputs

    Algorithms but no implementation

    StructuralStructural Implements behavior by connecting

    components with known behavior PhysicalPhysical

    Gives size/locations of components andwires on chip/board

    Synthesis converts behavior at given levelto structure at same level or lower E.g.,

    FSM gates, flip-flops (same level)

    FSM transistors (lower level)

    FSM X registers, FUs (higher level)

    FSM X processors, memories (higherlevel)

    Behavior

    Physical

    Structural

    Processors, memories

    Registers, FUs, MUXsGates, flip-flops

    Transistors

    Sequential programs

    Register transfersLogic equations/FSM

    Transfer functions

    Cell Layout

    Modules

    Chips

    Boards

    FU = functional unitFU = functional unit

    FSM = finite state machineFSM = finite state machine

  • 8/8/2019 Combination Alto Processor Unit 1

    8/64

    Introduction

    Processor Digital circuit that performs a computation

    tasks

    Controller and datapath

    General-purpose: variety of computationtasks

    Single-purpose: one particular computationtask

    Custom single-purpose: non-standard task

    A custom single-purpose processormay be Fast, small, low power

    But, high NRE, longer time-to-market, lessflexible

    Microcontroller

    CCD

    preprocessor

    Pixel coprocessor

    A2D

    D2A

    JPEG codec

    DMA controller

    Memory controller ISA bus interface UART LCD ctrl

    Display

    ctrl

    Multiplier/Accum

    Digital camera chip

    lens

    CCD

  • 8/8/2019 Combination Alto Processor Unit 1

    9/64

    CMOS transistor on silicon

    Transistor

    The basic electrical component in digital systems

    Acts as an on/off switch

    Voltage at gate controls whether current flows from source to drain

    Dont confuse this gate with a logic gate

    source drain

    oxide

    gate

    IC package ICchannel

    Silicon substrate

    gate

    source

    drain

    Conducts

    if gate=11

  • 8/8/2019 Combination Alto Processor Unit 1

    10/64

    CMOS transistor implementations

    Complementary Metal Oxide

    Semiconductor

    We refer to logic levels

    Typically 0 is 0V, 1 is 5V

    Two basic CMOS types

    nMOS conducts if gate=1

    pMOS conducts if gate=0

    Hence complementary

    Basic gates Inverter, NAND, NOR

    x F = x'

    1

    inverter

    0

    F = (xy)'

    x

    1

    x

    y

    y

    NAND gate

    0

    1

    F = (x+y)'

    x y

    x

    y

    NOR gate

    0

    gate

    source

    drain

    nMOS

    Conducts

    if gate=1

    gate

    source

    drain

    pMOS

    Conducts

    if gate=0

  • 8/8/2019 Combination Alto Processor Unit 1

    11/64

    Basic logic gates

    F = x y

    AND

    F = (x y)NAND

    F = x y

    XOR

    F = x

    Driver

    F = xInverter

    x F

    F = x + y

    OR

    F = (x+y)NOR

    x F

    x

    yF

    Fx

    y

    x

    yF

    x

    yF

    x

    yF

    F = x yXNOR

    Fy

    xx

    0

    y

    0

    F

    0

    0 1 0

    1 0 0

    1 1 1

    x

    0

    y

    0

    F

    0

    0 1 1

    1 0 1

    1 1 1

    x

    0

    y

    0

    F

    0

    0 1 1

    1 0 1

    1 1 0

    x

    0

    y

    0

    F

    1

    0 1 0

    1 0 0

    1 1 1

    x

    0

    y

    0

    F

    1

    0 1 1

    1 0 1

    1 1 0

    x

    0

    y

    0

    F

    1

    0 1 0

    1 0 0

    1 1 0

    x F

    0 0

    1 1

    x F

    0 1

    1 0

  • 8/8/2019 Combination Alto Processor Unit 1

    12/64

    Combinational logic designCombinational logic design

    A) Problem description

    y is 1 if a is to 1, or b and c are 1. z is 1 if

    b or c is to 1, but not both, or if all are 1.

    D) Minimized output equations

    00

    0

    1

    01 11 10

    0

    1

    0 1 0

    1 1 1

    abcy

    y = a + bc

    00

    0

    1

    01 11 10

    0

    0

    1 0 1

    1 1 1

    z

    z = ab + bc + bc

    a bc

    C) Output equations

    y = a'bc + ab'c' + ab'c + abc' + abc

    z = a'b'c + a'bc' + ab'c + abc' + abc

    B) Truth table

    1 0 1 1 11 1 0 1 11 1 1 1 1

    0 0 1 0 10 1 0 0 10 1 1 1 01 0 0 1 0

    00 0 0 0

    Inputs

    a b c

    Outputs

    y z

    E) Logic Gates

    abc

    y

    z

  • 8/8/2019 Combination Alto Processor Unit 1

    13/64

    Combinational components

    With enable input e

    all Os are 0 if e=0

    With carry-in input Ci

    sum = A + B + Ci

    May have status outputs

    carry, zero, etc.

    O =

    I0 if S=0..00

    I1 if S=0..01

    I(m-1) if S=1..11

    O0 =1 if I=0..00

    O1 =1 if I=0..01

    O(n-1) =1 if I=1..11

    sum = A+B

    (first n bits)

    carry = (n+1)th

    bit of A+B

    less = 1 if AB

    O = A op B

    op determined

    by S.

    n-bit, m x 1

    Multiplexor

    O

    S0

    S(log

    m)

    n

    n

    I(m-1) I1 I0

    log n x n

    Decoder

    O1O0O(n-1)

    I0I(log n -1)

    n-bit

    Adder

    n

    A B

    n

    sumcarry

    n-bit

    Comparator

    n n

    A B

    less equal greater

    n bit,

    m function

    ALU

    n n

    A B

    S0

    S(log

    m)n

    O

  • 8/8/2019 Combination Alto Processor Unit 1

    14/64

    14

    Logic synthesisLogic synthesis Logic-level behavior to structural implementation

    Logic equations and/or FSM to connected gates

    Combinational logic synthesis Two-level minimization (Sum of products/product of sums)

    Best possible performance Longest path = 2 gates (AND gate + OR gate/OR gate + AND gate)

    Minimize size Minimum cover

    Minimum cover that is prime

    Heuristics

    Multilevel minimization Trade performance for size

    Pareto-optimal solution

    Heuristics FSM synthesis

    State minimization

    State encoding

  • 8/8/2019 Combination Alto Processor Unit 1

    15/64

    15

    Two-level minimization

    Represent logic function as sum of

    products (or product of sums)

    AND gate for each product

    OR gate for each sum

    Gives best possible performance At most 2 gate delay

    Goal: minimize size

    Minimum cover

    Minimum # of AND gates (sum of products)

    Minimum cover that is prime

    Minimum # of inputs to each AND gate (sum

    of products)

    F = abc'd' + a'b'cd + a'bcd + ab'cd

    Sum of products

    4 4-input AND gates and1 4-input OR gate

    40 transistors

    a

    b

    c

    d

    F

    Direct implementation

  • 8/8/2019 Combination Alto Processor Unit 1

    16/64

    16

    Minimum cover

    Minimum # of AND gates (sum of products)

    Literal: variable or its complement

    a or a, b or b, etc.

    Minterm: product of literals

    Each literal appears exactly once abcd, abcd, abcd, etc.

    Implicant: product of literals

    Each literal appears no more than once

    abcd, acd, etc.

    Covers 1 or more minterms acd covers abcd and abcd

    Cover: set of implicants that covers all minterms of function

    Minimum cover: cover with minimum # of implicants

  • 8/8/2019 Combination Alto Processor Unit 1

    17/64

    17

    Minimum cover: K-map approach

    Karnaugh map (K-map)

    1 represents minterm

    Circle represents implicant

    Minimum cover

    Covering all 1s with min # of

    circles

    Example: direct vs. min cover

    Less gates

    4 vs. 5

    Less transistors

    28 vs. 40

    11

    10 0 0

    0 0 1 0

    1 0 0 0

    0 0 0

    abcd

    00

    01

    11

    10

    00 01 10

    1

    10 0 0

    0 0 1 0

    1 0 0 0

    0 0 0

    abcd

    00

    01

    11

    10

    00 01 11 10

    1

    F=abc'd' + a'cd + ab'cd

    a

    b

    c

    d

    F

    2 4-input AND gate1 3-input AND gates1 4 input OR gate

    28 transistors

    K-map: sum of products K-map: minimum cover

    Minimum cover

    Minimum cover implementation

  • 8/8/2019 Combination Alto Processor Unit 1

    18/64

    18

    Minimum cover that is prime

    Minimum # of inputs to AND gates

    Prime implicant Implicant not covered by any other

    implicant

    Max-sized circle in K-map

    Minimum cover that is prime Covering with min # of prime implicants

    Min # of max-sized circles

    Example: prime cover vs. min cover

    Same # of gates 4 vs. 4

    Less transistors 26 vs. 28

    10 0 0

    0 0 1 0

    1 0 0 0

    0 0 0

    abcd

    00

    01

    11

    10

    00 01 11 10

    1

    K-map: minimum cover that is prime

    Minimum cover that is prime

    F=abc'd' + a'cd + b'cd

    1 4-input AND gate

    2 3-input AND gates

    1 4 input OR gate

    26 transistors

    F

    a

    b

    c

    d

    Implementation

  • 8/8/2019 Combination Alto Processor Unit 1

    19/64

    19

    Minimum cover: heuristics

    K-maps give optimal solution every time

    Functions with > 6 inputs too complicated

    Use computer-based tabular method

    Finds allprime implicants

    Finds min cover that is prime Also optimal solution every time

    Problem: 2n minterms for n inputs

    32 inputs = 4 billion minterms

    Exponential complexity

    Heuristic Solution technique where optimal solution not guaranteed

    Hopefully comes close

  • 8/8/2019 Combination Alto Processor Unit 1

    20/64

    20

    Heuristics: iterative improvement

    Start with initial solution i.e., original logic equation

    Repeatedly make modifications toward better solution

    Common modifications ExpandExpand

    Replace each nonprime implicant with a prime implicant covering it Delete all implicants covered by new prime implicant

    ReduceReduce Opposite of expand

    ReshapeReshape Expands one implicant while reducing another

    Maintains total # of implicants

    IrredundantIrredundant Selects min # of implicants that cover from existing implicants

    Synthesis tools differ in modifications used and the orderthey are used

  • 8/8/2019 Combination Alto Processor Unit 1

    21/64

    21

    Multilevel logic minimization

    Trade performance for size

    Increase delay for lower # of gates

    Gray area represents all possible

    solutions

    Circle with X represents ideal solution Generally not possible

    2-level gives best performance

    max delay = 2 gates

    Solve for smallest size

    Multilevel givespareto-optimal solutionpareto-optimal solution Minimum delay for a given size

    Minimum size for a given delay

    size

    delay

    multi-

    leve

    lminim

    .

    2-level minim.

  • 8/8/2019 Combination Alto Processor Unit 1

    22/64

  • 8/8/2019 Combination Alto Processor Unit 1

    23/64

    23

    FSM synthesisFSM synthesis FSM to gates State minimization

    Reduce # of states

    Identify and merge equivalent states Outputs, next states same for all possible inputs

    Tabular method gives exact solution

    Table of all possible state pairs

    If n states, n2 table entries

    Thus, heuristics used with large # of states

    State encoding

    Unique bit sequence for each state If n states, log2(n) bits

    n! possible encodings

    Thus, heuristics common

  • 8/8/2019 Combination Alto Processor Unit 1

    24/64

    Sequential componentsSequential components

    Q =

    0 if clear=1,

    I if load=1 and clock=1,

    Q(previous) otherwise.

    Q =

    0 if clear=1,

    Q(prev)+1 if count=1 and clock=1.

    clear

    n-bit

    Register

    n

    n

    load

    I

    Q

    shift

    I Q

    n-bit

    Shift register

    n-bit

    Counter

    n

    Q

    Q = lsb

    - Content shifted

    - I stored in msb

    Reversible shifter shifts left and rigth

    Reversible counter counts up and down

    Reading it operation in most of registers generalized registers.generalized registers.

  • 8/8/2019 Combination Alto Processor Unit 1

    25/64

    Sequential logic designSequential logic design

    A) Problem Description

    You want to construct a clock

    divider. Slow down your pre-

    existing clock so that you output a

    1 for every four clock cycles

    0

    1 2

    3

    x=0

    x=1x=0

    x=0

    a=1 a=1

    a=1

    a=1

    a=0

    a=0

    a=0

    a=0

    B) State Diagram

    C) Implementation Model

    Combinational logic

    State register

    ax

    I0

    I0

    I1

    I1

    Q1 Q0

    D) State Table (Moore-type)

    1 0 1 1 11 1 0 1 11 1 1 0 0

    0 0 1 0 10 1 0 0 1

    0 1 1 1 01 0 0 1 0

    00 0 0 0

    InputsQ1 Q0 a

    Outputs

    I1 I0

    1

    0

    0

    0

    x

    Given this implementation model Sequential logic design quickly reduces to combinational logic

    design

  • 8/8/2019 Combination Alto Processor Unit 1

    26/64

    Sequential logic design (cont.)

    00

    1

    Q1Q0I1

    I1 = Q1Q0a + Q1a +

    Q1Q0

    0 1

    1

    1

    010

    00 11 10a 01

    0

    0

    0

    1 0 1

    1

    00 01 11a

    1

    10I0

    Q1Q0

    I0 = Q0a + Q0a0

    1

    0 0

    0

    1

    1

    0

    0

    00 01 11 10x = Q1Q0

    x

    0

    1

    0

    a

    Q1Q0

    E) Minimized Output Equations F) Combinational Logic

    a

    Q1Q0

    I0

    I1

    x

  • 8/8/2019 Combination Alto Processor Unit 1

    27/64

    Custom single-purpose processorCustom single-purpose processor

    basic modelbasic model

    controller and datapath

    controller datapath

    external

    control

    inputs

    external

    control

    outputs

    external

    data

    inputs

    external

    data

    outputs

    datapath

    control

    inputs

    datapath

    control

    outputs

    a view inside the controller and datapath

    controllercontroller datapathdatapath

    state

    register

    next-state

    and

    controllogic

    registers

    functional

    units

  • 8/8/2019 Combination Alto Processor Unit 1

    28/64

    Example:Example: greatest common divisor

    GCD

    (a) black-box(a) black-box

    viewview

    x_i y_i

    d_o

    go_i

    0: int x, y;

    1: while (1) {

    2: while (!go_i);

    3: x = x_i;

    4: y = y_i;

    5: while (x != y) {

    6: if (x < y)7: y = y - x;

    else

    8: x = x - y;

    }

    9: d_o = x;

    }

    (b) desired functionality(b) desired functionality

    y = y -x7: x = x - y8:

    6-J:

    x!=y

    5: !(x!=y)

    x

  • 8/8/2019 Combination Alto Processor Unit 1

    29/64

    State diagram templates

    Assignment statement

    a = b

    next statement

    a = b

    next

    statement

    Loop statement

    while (cond) {

    loop-body-

    statements

    }

    next statement

    loop-body-

    statements

    cond

    next

    statement

    !cond

    J:

    C:

    Branch statement

    if (c1)

    c1 stmts

    else if c2

    c2 stmts

    else

    other stmts

    next statement

    c1

    c2 stmts

    !c1*c2 !c1*!c2

    next

    statement

    othersc1 stmts

    J:

    C:

  • 8/8/2019 Combination Alto Processor Unit 1

    30/64

    Creating the datapath

    Create a register for anydeclared variable

    Create a functional unit for

    each arithmetic operation

    Connect the ports, registersand functional units

    Based on reads and writes

    Use multiplexors for multiple

    sources

    Create unique identifier

    for each datapath component

    control input and output

    y = y -x7: x = x - y8:

    6-J:

    x!=y

    5: !(x!=y)

    x

  • 8/8/2019 Combination Alto Processor Unit 1

    31/64

    Creating the controllers FSM

    Same structure as FSMD

    Replace complex

    actions/conditions with

    datapath configurations

    y = y -x7: x = x - y8:

    6-J:

    x!=y

    5: !(x!=y)

    x

  • 8/8/2019 Combination Alto Processor Unit 1

    32/64

    SplittingSplitting into a controller and datapath

    y_sel = 1

    y_ld = 17: x_sel = 1

    x_ld = 18:

    6-J:

    x_neq_y=1

    5:x_neq_y=0

    x_lt_y=1 x_lt_y=0

    6:

    5-J:

    d_ld = 1

    1-J:

    9:

    x_sel = 0

    x_ld = 13:

    y_sel = 0y_ld = 14:

    1:

    1

    !1

    2:

    2-J:

    !go_i

    !(!go_i)

    go_i

    0000

    0001

    0010

    0011

    0100

    0101

    0110

    0111 1000

    1001

    1010

    1011

    1100

    ControllerController implementation model

    y_sel

    x_sel

    Combinational

    logic

    Q3 Q0

    State register

    go_i

    x_neq_y

    x_lt_y

    x_ld

    y_ld

    d_ld

    Q2 Q1

    I3 I0I2 I1

    subtractor subtractor

    7: y-x8: x-y5: x!=y 6: x

  • 8/8/2019 Combination Alto Processor Unit 1

    33/64

    Controller state tableController state table for the GCD example

    Inputs Outputs

    Q3 Q2 Q1 Q0 x_neq

    _y

    x_lt_

    y

    go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld

    0 0 0 0 * * * 0 0 0 1 X X 0 0 0

    0 0 0 1 * * 0 0 0 1 0 X X 0 0 0

    0 0 0 1 * * 1 0 0 1 1 X X 0 0 0

    0 0 1 0 * * * 0 0 0 1 X X 0 0 0

    0 0 1 1 * * * 0 1 0 0 0 X 1 0 0

    0 1 0 0 * * * 0 1 0 1 X 0 0 1 0

    0 1 0 1 0 * * 1 0 1 1 X X 0 0 0

    0 1 0 1 1 * * 0 1 1 0 X X 0 0 0

    0 1 1 0 * 0 * 1 0 0 0 X X 0 0 0

    0 1 1 0 * 1 * 0 1 1 1 X X 0 0 0

    0 1 1 1 * * * 1 0 0 1 X 1 0 1 0

    1 0 0 0 * * * 1 0 0 1 1 X 1 0 0

    1 0 0 1 * * * 1 0 1 0 X X 0 0 0

    1 0 1 0 * * * 0 1 0 1 X X 0 0 0

    1 0 1 1 * * * 1 1 0 0 X X 0 0 1

    1 1 0 0 * * * 0 0 0 0 X X 0 0 0

    1 1 0 1 * * * 0 0 0 0 X X 0 0 0

    1 1 1 0 * * * 0 0 0 0 X X 0 0 0

    1 1 1 1 * * * 0 0 0 0 X X 0 0 0

  • 8/8/2019 Combination Alto Processor Unit 1

    34/64

    Completing the GCD custom single-purpose

    processor design

    We finished the datapath

    We have a state table for the

    next state and control logic

    All thats left is

    combinational logic design

    This is notan optimized

    design, but we see the basic

    steps

    a view inside the controller and datapath

    controller datapath

    state

    register

    next-state

    and

    control

    logic

    registers

    functional

    units

    You may be asked in homeworks or exams or projects to optimize the design with some

    respect such as area, speed , power or testability

  • 8/8/2019 Combination Alto Processor Unit 1

    35/64

    We often start with a statemachine

    Rather than algorithm

    Cycle timing often too central to

    functionality

    Example Bus bridge that converts 4-bit bus to

    8-bit bus

    Start with FSMD

    Known as register-transfer (RT) level

    Exercise: complete the design

    RT-level custom single-purpose processor

    design Example Bus Bridge

    ProblemSpecification

    Bridge

    A single-purpose processor that

    converts two 4-bit inputs, arriving one

    at a time overdata_in along with a

    rdy_in pulse, into one 8-bit output on

    data_outalong with a rdy_outpulse.

    Sende

    r

    data_in(4)

    rdy_in rdy_out

    data_out(8)

    Rece

    iver

    clock

    FSM

    D

    WaitFirst4 RecFirst4Start

    data_lo=data_in

    WaitSecond4

    rdy_in=1

    rdy_in=0

    RecFirst4End

    rdy_in=1

    RecSecond4Start

    data_hi=data_in

    RecSecond4End

    rdy_in=1rdy_in=0

    rdy_in=1

    rdy_in=0

    Send8Start

    data_out=data_hi

    & data_lo

    rdy_out=1

    Send8End

    rdy_out=0

    Bridge

    rdy_in=0Inputs

    rdy_in: bit; data_in: bit[4];

    Outputs

    rdy_out: bit; data_out:bit[8]

    Variables

    data_lo, data_hi: bit[4];

    RT level custom single purpose processor design (cont)

  • 8/8/2019 Combination Alto Processor Unit 1

    36/64

    RT-level custom single-purpose processor design (cont )

    WaitFirst4 RecFirst4Startdata_lo_ld=1

    WaitSecond4

    rdy_in=1

    rdy_in=0

    RecFirst4End

    rdy_in=1

    RecSecond4Start

    data_hi_ld=1

    RecSecond4End

    rdy_in=1rdy_in=0

    rdy_in=1

    rdy_in=0

    Send8Startdata_out_ld=1

    rdy_out=1

    Send8Endrdy_out=0

    (a) Controller

    rdy_in rdy_out

    data_lodata_hi

    data_in(4)

    (b) Datapath

    data_outdata_out_ld

    data_hi_ld

    data_lo_ld

    clk

    to

    all

    registers

    data_out

    Bridge

    Example Bus

    Bridge

  • 8/8/2019 Combination Alto Processor Unit 1

    37/64

    Optimizing single-purpose processors

    Optimization is the task of making design metric

    values the best possible

    Optimization opportunities

    original program FSMD

    datapath

    FSM

  • 8/8/2019 Combination Alto Processor Unit 1

    38/64

    Optimizing the original program

    Analyze program attributes and look for areas of

    possible improvement

    number of computations

    size of variable time and space complexity

    operations used

    multiplication and division very expensive

  • 8/8/2019 Combination Alto Processor Unit 1

    39/64

    Optimizing the original program (cont)

    0: int x, y;

    1: while (1) {

    2: while (!go_i);

    3: x = x_i;

    4: y = y_i;

    5: while (x != y) {

    6: if (x < y)

    7: y = y - x;

    else8: x = x - y;

    }

    9: d_o = x;

    }

    0: int x, y, r;

    1: while (1) {

    2: while (!go_i);

    // x must be the larger number

    3: if (x_i >= y_i) {

    4: x=x_i;

    5: y=y_i;

    }

    6: else {7: x=y_i;

    8: y=x_i;

    }

    9: while (y != 0) {

    10: r = x % y;

    11: x = y;

    12: y = r;

    }

    13: d_o = x;}

    original program optimized program

    replace the subtraction

    operation(s) with modulo

    operation in order to speed

    up program

    GCD(42, 8) - 9 iterations to complete the loop

    x and y values evaluated as follows : (42, 8), (43, 8),

    (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2).

    GCD(42,8) - 3 iterations to complete the loop

    x and y values evaluated as follows: (42, 8), (8,2),

    (2,0)

  • 8/8/2019 Combination Alto Processor Unit 1

    40/64

    Optimizing the FSMD

    Areas of possible improvements

    merge states

    states with constants on transitions can be eliminated, transition

    taken is already known

    states with independent operations can be merged

    separate states

    states which require complex operations (a*b*c*d) can be broken

    into smaller states to reduce hardware size

    scheduling

  • 8/8/2019 Combination Alto Processor Unit 1

    41/64

    Optimizing the FSMD (cont.)

    int x, y;

    2:

    go_i !go_i

    x = x_i

    y = y_i

    xy

    y = y -x x = x - y

    3:

    5:

    7: 8:

    d_o = x9:

    y = y -x7:

    x = x - y8:

    6-J:

    x!=y

    5: !(x!=y)

    x

  • 8/8/2019 Combination Alto Processor Unit 1

    42/64

    Optimizing the datapath

    Sharing of functional units

    one-to-one mapping, as done previously, is not necessary

    if same operation occurs in different states, they can share a

    single functional unit

    Multi-functional units

    ALUs support a variety of operations, it can be shared

    among operations occurring in different states

  • 8/8/2019 Combination Alto Processor Unit 1

    43/64

    Optimizing the FSM

    State encoding task of assigning a unique bit pattern to each state in an FSM

    size of state register and combinational logic vary

    can be treated as an ordering problem State minimization

    task of merging equivalent states into a single state

    state equivalent if for all possible input combinations the two states

    generate the same outputs and transitions to the next same state

  • 8/8/2019 Combination Alto Processor Unit 1

    44/64

    44

    Technology mappingTechnology mapping

    Library of gates available for implementation Simple

    only 2-input AND,OR gates

    Complex

    various-input AND,OR,NAND,NOR,etc. gates Efficiently implemented meta-gates (i.e., AND-OR-INVERT,MUX)

    Final structure consists of specified librarys components only

    If technology mapping integrated with logic synthesis

    More efficient circuit More complex problem

    Heuristics required

  • 8/8/2019 Combination Alto Processor Unit 1

    45/64

    45

    Complexity impact on user

    As complexity grows, heuristics used Heuristics differ tremendously among synthesis tools

    Computationally expensive

    Higher quality results

    Variable optimization effort settings

    Long run times (hours, days)

    Requires huge amounts of memory

    Typically needs to run on servers, workstations

    Fast heuristics

    Lower quality results

    Shorter run times (minutes, hours)

    Smaller amount of memory required

    Could run on PC

    Super-linear-time (i.e. n3) heuristics usually used

    User can partition large systems to reduce run times/size

    1003 > 503 + 503(1,000,000 > 250,000)

  • 8/8/2019 Combination Alto Processor Unit 1

    46/64

    46

    Integrating logic design and physical design

    Past Gate delay much greater than wire delay

    Thus, performance evaluated as # of levels

    of gates only

    Today

    Gate delay shrinking as feature size

    shrinking

    Wire delay increasing

    Performance evaluation needs wire length

    Transistor placement (needed for wire

    length) domain of physical design Thus, simultaneous logic synthesis and

    physical design required for efficient

    circuits

    Wire

    Transistor

    Delay

    Reduced feature size

  • 8/8/2019 Combination Alto Processor Unit 1

    47/64

    Embedded SystemsEmbedded Systems

    CaseCase

    StudyStudy

    47

    Elevator Controller

  • 8/8/2019 Combination Alto Processor Unit 1

    48/64

    48

  • 8/8/2019 Combination Alto Processor Unit 1

    49/64

    Elevator System

    CRC cardsCRC cards is a well-known method for analyzing asystem and developing an architecture.

    CRCCRC Classes: logical groupings of data and functionality

    Responsibilities: describe what the class do Collaborators: other classes w/ which a given class works

    Elevator Control ClassesElevator Control Classes Elevator car, Passenger, Floor control, Car control, Car sensors, etc.

    Architectural ClassesArchitectural Classes Car state, Floor control reader, Car control reader, Car control sender,

    Scheduler

    49

  • 8/8/2019 Combination Alto Processor Unit 1

    50/64

    50

    F floorsF floors

    N hoistwaysN hoistways

  • 8/8/2019 Combination Alto Processor Unit 1

    51/64

    51

  • 8/8/2019 Combination Alto Processor Unit 1

    52/64

    52

  • 8/8/2019 Combination Alto Processor Unit 1

    53/64

    53

  • 8/8/2019 Combination Alto Processor Unit 1

    54/64

  • 8/8/2019 Combination Alto Processor Unit 1

    55/64

    55

    Classes: logical groupings of data and functionality

    Responsibilities: describe what the class do

    Collaborators: other classes w/ which a given class works

    Elevator Control ClassesElevator Control Classes

    Elevator car, Passenger, Floor control, Car control, Car sensors, etc.

    Architectural ClassesArchitectural Classes

    Car state, Floor control reader, Car control reader, Car control sender, Scheduler

    PhysicalPhysical

    InterfacesInterfaces

  • 8/8/2019 Combination Alto Processor Unit 1

    56/64

    56

  • 8/8/2019 Combination Alto Processor Unit 1

    57/64

    Architecture

    Computation and I/O occur at:

    Floor control panels/displays

    Elevator cars

    System controller

    Panels Controller

    Car Controller

    read buttons and send events to system controller

    read sensor inputs and send to system controller

    57

  • 8/8/2019 Combination Alto Processor Unit 1

    58/64

    System ControllerSystem Controller

    Must take inputs from many sources: Must control cars to hard real-time deadlines

    User interface, scheduling are soft deadlines

    Testing

    Build an elevator simulator using SystemC,

    Verilog, VHDL and FPGA

    Simulate multiple elevators

    Simulate real-time control demands

    58

    H k 2H k 2

  • 8/8/2019 Combination Alto Processor Unit 1

    59/64

    Homework 2Homework 2 The simplest possible custom single-purpose processor

    Design a processor to multiply two numbers. The initial data are in registers/countersA and B. The result should be in register/counter C.

    You have only reversible counters (with reading) to be used in the data path.

    The counters perform the following operations:

    Add one

    Subtract one

    Read new value

    Invent the algorithm for multiplication. Use minimum number of counters

    Design the reversible counter by hand using logic gates and D FFs.

    Design the control unit

    Design the data path

    Draw the timing diagram of the whole system.

    You can use VHDL or Verilog to help you, but I need your design by hand.

  • 8/8/2019 Combination Alto Processor Unit 1

    60/64

    Questions to Exams (1)

  • 8/8/2019 Combination Alto Processor Unit 1

    61/64

    Q ( )

    1. What are the main methods of Combinational logic design?

    2. What is Mealy FSM (Finite State Machine)?

    3. What is Moore State Machine?4. Think about a robot controller as a Sequential logic Circuit. What are the

    blocks and their role?

    5. Role of abstraction in FSM design. Give examples.

    6. Explain the concepts from Gajskis Chart in a Custom single-purposeprocessor design

    7. RT-level custom single-purpose processor design. Explain briefly all designstages from bottom of design hierarchy (layout) to the top (system design ofa GCD processor as an example)

    8. List and explain logic gates.

    9. List and explain combinational blocks.

    10. List and explain sequential blocks.

    11. List and explain sensors to be used with embedded systems of FSM type.

    12. List and explain actuators to be used with such embedded systems.

    Questions to Exams (2)

  • 8/8/2019 Combination Alto Processor Unit 1

    62/64

    Q ( )

    1. What are the main synthesis processes and CAD tools in Combinational logic design?

    2. What are the methods to solve the covering problem?

    3. Explain the concept of search and give examples.

    4. Explain the concept of heuristic in search and give examples. SOP minimization can be veryuseful. Also ESOP.

    5. Explain design tradeoffs and Pareto Optimization on one practical example.

    6. Explain in detail on example the basic synthesis method for Mealy FSM from specification toa circuit from D type flip-flops (FFs) and logic gates.

    7. Explain and illustrate how D, T and JK flip-flops work.

    8. What is a difference between Register with enable

    Register without enable

    Reversible register

    1. Draw the schematic of the FSMD.

    2. Explain GCD algorithm of Euclides on examples.

    3. Without looking to the slides, convert GCD algorithm to a FSMD.

    4. How can we optimize GCD?5. Apply these ideas to Least Common Multiplier algorithm and FSMD for two numbers.

    Questions to Exams (3)

  • 8/8/2019 Combination Alto Processor Unit 1

    63/64

    Q ( )

    1. The role of GO-TO commands in FSMD design. Are they good or bad? Give examples. The role of structured design ofFSMD.

    2. How the data path is created from FSMD? This is one of main topics for this whole class. You have to know it well.

    3. How CU (Control Unit) is created from FSMD? This is one of main topics for this whole class. You have to know it well.

    4. Compare state graph, state transition table and flow-chart. Why we need all of them?5. In this class we are not optimizing combinational logic or FSMs too much. But if you have taken ECE 572 or ECE 573

    classes you know many methods to optimize on these levels. Can you give practical examples of these optimizations inGCD or other similar system?

    6. Complete the Bus bridge FSMD that converts 4-bit bus to 8-bit bus and is given in these slides.

    7. Discuss Optimizingthe single-purpose processors. Give examples. Explain levels of optimization, such as the originalprogram, the FSMD, the data path, the CU, the register, the combinational logic, finally the technology mapping.

    8. Design the complete elevator system for a villa of a crazy millionaire artist from Hollywood. Cost does not count. Youhave to amaze his guests.

  • 8/8/2019 Combination Alto Processor Unit 1

    64/64

    Sources

    Slides from S. Mohammadi

    Vahid, Siamak MohammadiGivargis and Marwedel

    EECE 353-1Real-Time Systems

    T. John Koo

    Embedded Computing Systems Laboratory

    Institute for Software Integrated Systems

    Department of Electrical Engineering and Computer Science

    Vanderbilt University

    5306 Stevenson Center

    January 16, 2006

    [email protected]