36
VLSI-1 Class Notes Lecture 12: Datapath Design Mark McDermott Electrical and Computer Engineering The University of Texas at Austin 10/1/18

Lecture 12: Datapath Design - University of Texas at Austinusers.ece.utexas.edu/~mcdermot/vlsi1/main/lectures/...sssssss s sssss s sss s s PP0 PP 1 PP2 PP3 PP 4 PP5 PP6 PP 7 PP8 10/1/18

  • Upload
    others

  • View
    2

  • Download
    1

Embed Size (px)

Citation preview

  • VLSI-1 Class Notes

    Lecture 12:Datapath Design

    Mark McDermottElectrical and Computer Engineering

    The University of Texas at Austin

    10/1/18

  • VLSI-1 Class Notes

    Datapath Design

    § Comparators§ Shifters§ Adders§ Multipliers§ Registers

    Page 210/1/18

  • VLSI-1 Class Notes

    ARM Datapath

    Page 310/1/18

  • VLSI-1 Class Notes

    Comparators

    § 0 s detector: A = 00…000§ 1 s detector: A = 11…111§ Equality comparator: A = B§ Magnitude comparator: A < B

    10/1/18 Page 4

  • VLSI-1 Class Notes

    1 s & 0 s Detectors

    § 1 s detector: N-input AND gate§ 0 s detector: NOTs + 1 s detector (N-input NOR)

    A0A1

    A2A3

    A4A5

    A6A7

    allones

    A0A1

    A2A3

    allzeros

    allones

    A1

    A2A3

    A4A5

    A6A7

    A0

    10/1/18 Page 5

  • VLSI-1 Class Notes

    Equality Comparator

    § Check if each bit is equal (XNOR, or equality gate )§ 1 s detect on bitwise equality

    A[0]B[0]

    A = B

    A[1]B[1]A[2]B[2]A[3]B[3]

    10/1/18 Page 6

  • VLSI-1 Class Notes

    Magnitude Comparator

    § Compute B-A and look at sign§ B-A = B + ~A + 1§ For unsigned numbers, carry out is sign bit

    A0

    B0A1

    B1A2

    B2A3

    B3

    A = BZ

    C

    A B£

    N A B³

    10/1/18 Page 7

  • VLSI-1 Class Notes

    Signed vs. Unsigned

    § For signed numbers, comparison is harder– C: carry out– Z: zero (all bits of A-B are 0)–N: negative (MSB of result)– V: overflow (inputs had different signs, output sign ¹¹ B)

    10/1/18 Page 8

  • VLSI-1 Class Notes

    § Logical Shift:– Shifts number left or right and fills with 0 s

    • 1011 LSR 1 = 0101 1011 LSL1 = 0110

    Shifters

    Logical Shift Left (LSL)

    DestinationCF 0

    Destination CF

    Logical Shift Right (LSR)

    ...0

    zero shifted in

    10/1/18 Page 9

    zero shifted in

  • VLSI-1 Class Notes

    Shifters

    § Arithmetic Shift:– Shifts number left or right. Right shift sign extends

    1011 ASR1 = 1101 1011 ASL1 = 0110

    Destination CF

    Arithmetic Shift Right

    Sign bit shifted in

    Arithmetic Shift Right (ASR) Shifts right (divides by powers of two) and preserves the sign bit, for 2's complement operations. e.g.

    ASR #5 = divide by 32

    10/1/18 Page 10

  • VLSI-1 Class Notes

    Shifters

    § Barrel Shift (Rotate):– Shifts number left or right and fills with lost bits

    1011 ROR1 = 1101 1011 ROL1 = 0111

    Destination CF

    Rotate Right

    10/1/18 Page 11

    Destination CF

    Rotate Right through Carry

    Rotate Right (ROR)Similar to an ASR but the bits wrap around as they leave the LSB and appear as the MSB.e.g. ROR #5Note the last bit rotated is also used as the Carry Out.

    Rotate Right Extended (RRX)This operation uses the CPSR C flag as a 33rd bit. Rotates right by 1 bit. Encoded as ROR #0

  • VLSI-1 Class Notes

    ARM: Using the Barrel Shifter: The Second Operand

    1210/1/18

    Register, optionally with shift operation applied.Shift value can be either be:

    5 bit unsigned integerSpecified in bottom byte of another register.

    Immediate value8 bit numberCan be rotated right through an even number of positions.Assembler will calculate rotate for you from constant.

    Operand 1

    Result

    ALU

    Barrel Shifter

    Operand 2

  • VLSI-1 Class Notes

    Second Operand: Using a Shifted Register

    § Using a multiplication instruction to multiply by a constant means first loading the constant into a register and then waiting a number of internal cycles for the instruction to complete.

    § A more optimum solution can often be found by using some combination of MOVs, ADDs, SUBs and RSBs with shifts.– Multiplications by a constant equal to a ((power of 2) 1) can be done in

    one cycle.

    MOV R2, R0, LSL #2 ; Shift R0 left by 2, write to R2, (R2=R0x4)ADD R9, R5, R5, LSL #3 ; R9 = R5 + R5 x 8 or R9 = R5 x 9RSB R9, R5, R5, LSL #3 ; R9 = R5 x 8 - R5 or R9 = R5 x 7SUB R10, R9, R8, LSR #4 ; R10 = R9 - R8 / 16MOV R12, R4, ROR R3 ; R12 = R4 rotated right by value of R3

    1310/1/18

  • VLSI-1 Class Notes

    Funnel Shifter

    § A funnel shifter can do all six types of shifts§ Selects N-bit field Y from 2N-bit input

    – Shift by k bits (0 ££ k < N)

    B C

    offsetoffset + N-1

    0N-12N-1

    Y

    10/1/18 Page 14

  • VLSI-1 Class Notes

    Funnel Shifter Operation

    10/1/18 Page 15

  • VLSI-1 Class Notes

    Simplified Funnel Shifter

    § Optimize down to 2N-1 bit input

    10/1/18 Page 16

  • VLSI-1 Class Notes

    Funnel Shifter Design 1

    § N N-input multiplexers– Use 1-of-N hot select signals for shift amount– nMOS pass transistor design (Vt drops!)

    k[1:0]

    s0s1s2s3Y3

    Y2

    Y1

    Y0

    Z0Z1Z2Z3Z4

    Z5

    Z6

    left Inverters & Decoder

    10/1/18 Page 17

  • VLSI-1 Class Notes

    Funnel Shifter Design 2

    § Log N stages of 2-input MUXes– No select decoding needed

    Y3

    Y2

    Y1

    Y0Z0

    Z1

    Z2

    Z3

    Z4

    Z5

    Z6

    k0k1left

    10/1/18 Page 18

  • VLSI-1 Class Notes

    Logarithmic Barrel Shifter

    Right shift only

    Right/Left shift

    Right/Left Shift & Rotate

  • VLSI-1 Class Notes

    32-bit Logarithmic Barrel

    § Datapath never wider than 32 bits§ First stage preshifts by 1 to handle left shifts

    20

  • VLSI-1 Class Notes

    Multi-input Adders

    § Suppose we want to add k N-bit words– Ex: 0001 + 0111 + 1101 + 0010 = 10111

    § Straightforward solution: k-1 N-input CPAs– Large and slow

    +

    +

    0001 0111

    +

    1101 0010

    10101

    10111

    10/1/18 Page 21

  • VLSI-1 Class Notes

    Carry Save Addition

    § Full adder sums 3 inputs, produces 2 outputs– Carry output has twice weight of sum output

    § N full adders in parallel: carry save adder– Produce N sums and N carry outs

    Z4Y4X4

    S4C4

    Z3Y3X3

    S3C3

    Z2Y2X2

    S2C2

    Z1Y1X1

    S1C1XN...1 YN...1 ZN...1

    SN...1CN...1

    n-bit CSA

    10/1/18 Page 22

  • VLSI-1 Class Notes

    CSA Application

    § Use k-2 stages of CSAs– Keep result in carry-save redundant form

    § Final CPA computes actual result

    4-bit CSA

    5-bit CSA

    0001 0111 1101 0010

    +

    10110101_

    01010_ 00011

    0001 0111+1101 10110101_

    XYZSC

    0101_ 1011 +0010 0001101010_

    XYZSC

    01010_+ 00011 10111

    ABS

    10111

    10/1/18 Page 23

  • VLSI-1 Class Notes

    Multiplication

    § Example:

    § M x N-bit multiplication– Produce N M-bit partial products– Sum these to produce M+N-bit product

    1100 : 1210 0101 : 510 1100 0000 1100 000000111100 : 6010

    multipliermultiplicand

    partialproducts

    product

    10/1/18 Page 24

  • VLSI-1 Class Notes

    General Form

    § Multiplicand: Y = (yM-1, yM-2, …, y1, y0)§ Multiplier: X = (xN-1, xN-2, …, x1, x0)

    § Product: 1 1 1 1

    0 0 0 02 2 2

    M N N Mj i i j

    j i i jj i i j

    P y x x y- - - -

    +

    = = = =

    æ öæ ö= =ç ÷ç ÷è øè ø

    å å åå

    x0y5 x0y4 x0y3 x0y2 x0y1 x0y0

    y5 y4 y3 y2 y1 y0x5 x4 x3 x2 x1 x0

    x1y5 x1y4 x1y3 x1y2 x1y1 x1y0x2y5 x2y4 x2y3 x2y2 x2y1 x2y0

    x3y5 x3y4 x3y3 x3y2 x3y1 x3y0x4y5 x4y4 x4y3 x4y2 x4y1 x4y0

    x5y5 x5y4 x5y3 x5y2 x5y1 x5y0p0p1p2p3p4p5p6p7p8p9p10p11

    multipliermultiplicand

    partialproducts

    product

    10/1/18 Page 25

  • VLSI-1 Class Notes

    Dot Diagram

    § Each dot represents a bit

    partial products

    multiplier x

    x0

    x15

    10/1/18 Page 26

  • VLSI-1 Class Notes

    Array Multipliery0y1y2y3

    x0

    x1

    x2

    x3

    p0p1p2p3p4p5p6p7

    B

    ASin Cin

    SoutCout

    BA

    CinCout

    Sout

    Sin=

    CSAArray

    CPA

    critical path BA

    Sout

    Cout CinCout

    Sout

    =Cin

    BA

    10/1/18 Page 27

  • VLSI-1 Class Notes

    Rectangular Array

    § Squash array to fit rectangular floorplan

    y0y1y2y3

    x0

    x1

    x2

    x3

    p0

    p1

    p2

    p3

    p4p5p6p7

    10/1/18 Page 28

  • VLSI-1 Class Notes

    Fewer Partial Products

    § Array multiplier requires N partial products§ If we looked at groups of r bits, we could form N/r partial

    products.– Faster and smaller?– Called radix-2r encoding

    § Ex: r = 2: look at pairs of bits– Form partial products of 0, Y, 2Y, 3Y– First three are easy, but 3Y requires adder LL

    10/1/18 Page 29

  • VLSI-1 Class Notes

    Booth Encoding

    § Instead of 3Y, try –Y, then increment next partial product to add 4Y

    § Similarly, for 2Y, try –2Y + 4Y in next partial product

    10/1/18 Page 30

  • VLSI-1 Class Notes

    Booth Hardware

    § Booth encoder generates control lines for each PP– Booth selectors choose PP bits

    Mi

    yj

    Xi

    yj-1

    2Xi

    PPij

    BoothSelector

    BoothEncoder

    x2i+1

    x2ix2i-1

    10/1/18 Page 31

  • VLSI-1 Class Notes

    Booth hardware

    Page 3210/1/18

  • VLSI-1 Class Notes

    Sign Extension

    § Partial products can be negative– Require sign extension, which is cumbersome– High fanout on most significant bit

    multiplier x

    x0

    x15

    0

    00

    x-1

    x16x17

    ssssssssssssssss

    ssssssssssssss

    ssssssssssss

    ssssssssss

    ssssssss

    ssssss

    ssss

    ss

    PP0PP1

    PP2

    PP3PP4

    PP5PP6

    PP7PP8

    10/1/18 Page 33

  • VLSI-1 Class Notes

    Simplified Sign Extension

    § Sign bits are either all 0 s or all 1’s– Note that all 0 s is all 1’s + 1 in proper column– Use this to reduce loading on MSB

    s111111111111111s

    s1111111111111s

    s11111111111s

    s111111111s

    s1111111s

    s11111s

    s111s

    s1s

    PP0

    PP1

    PP2

    PP3

    PP4

    PP5

    PP6

    PP7

    PP8

    10/1/18 Page 34

  • VLSI-1 Class Notes

    Even Simpler Sign Extension

    § No need to add all the 1 s in hardware– Precompute the answer!

    ssss

    ss1

    ss1

    ss1

    ss1

    ss1

    ss1

    ss

    PP0PP1PP2PP3PP4PP5PP6PP7PP8

    10/1/18 Page 35

  • VLSI-1 Class Notes

    Advanced Multiplication

    § Signed vs. unsigned inputs§ Higher radix Booth encoding§ Array vs. tree CSA networks

    10/1/18 Page 36