137
Institute of Applied Microelectronics & Computer Engineering Selected Topics of VLSI Design Prof. Dr.-Ing. Dirk Timmermann [email protected]

Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

Institute ofApplied Microelectronics & Computer Engineering

Selected Topics of VLSI Design

Prof. Dr.-Ing. Dirk [email protected]

Page 2: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

Please note this name change

3/31/2019 Selected Topics of VLSI Design 2

Module AdvancedVLSI Design

Selected Topics of

VLSI Design

Until2016

Short name „Chip project" "HW-Alg."Semester summer winter

SWS 1 1/1/1

Contenthardwarealgorithms VLSI chip project

Starting2017

Short name "Chip project" "HW-Alg."Semester winter summer

SWS 1 1/1/1ETCS 6 6

ContentVLSI chip project hardware

algorithms

Page 3: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Lecture: Hardware oriented arithmetic algorithms and cryptography

● Exercise: Algorithms, building blocks, VHDL coding● Lab: during project week● Schedule: lecture, Monday 15:xx – 16:yy

exercise, replaces lectures beginning with xx.y.mandatory lab with attendance list: 11.6.-12.6. 9:00

● Location: Warnemuende, building 1, R 1226

Organization

3/31/2019 3Selected Topics of VLSI Design

Page 4: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

Textbooks● Parhami, B.: Computer Arithmetic, Algorithms and Hardware Designs,

2nd edition, Oxford University Press, New York, 2010. ● Koren, I.: Computer Arithmetic Algorithms, 2002● Muller, J.M.: Elementary Functions, Algorithms and Implementation,

2nd ed., 2006● Klar, H., Noll, T.: Integrierte Digitale Schaltungen, Springer 2015, free

access from URO network● Pirsch, P.: Architekturen der digitalen Signalverarbeitung B.G. Teubner,

Stuttgart, 1996

Courses and Websites● Koren, I.: Computer arithmetic- Simulator ● Ercegovac, M.: Course Digital Arithmetic● Guyot, A. : Educational Applets● Strey, A.: Course Computer-Arithmetik

Literature

31.03.2019 Selected Topics of VLSI Design 4

Page 5: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

Institute ofApplied Microelectronics & Computer Engineering

Selected Topics of VLSI Design

Part 1: Number Systems

Prof. Dr.-Ing. Dirk [email protected]

Page 6: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● 1.1 Positional / Place-Value Notation of Numberso Representation of Integer Numbers, Real Numbers and Radix Selection

● 1.2 Signed Number Representationso Sign Magnitude, (r-1)-Complement, r-Complement and Redundant Binary

● 1.3 Roundingo via Truncation, Round-to-Nearest and Round-to-Nearest-Even

● 1.4 Overflowso in (r-1)-Complement, Carry-Save and Signed Redundant Binary Numberso Overflow Detection and Handling

● 1.5 Basic Operations● 1.6 Cost/Performance Estimation Basics

Outline

3/31/2019 Selected Topics of VLSI Design 6

Page 7: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● The number A is represented by n digits ai and a defined base/radix r

1.1 Positional / Place-Value Notation of Numbers

3/31/2019 Selected Topics of VLSI Design 7

o Binary r = 2o Ternary r = 3o Octal r = 8o Decimal r = 10o Hexadecimal r = 16

𝒊

● The value V(A) of the number A is given by the sum of the n partial products pi for each of its positions

● The partial product pi = ai ∙ri results from the multiplication of the digit aiwith its weight ri, which is a power of the radix r and determined by the position index i

Page 8: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Integer number A with n digits Value V(A)

1.1 Positional / Place-Value Notation of Numbers

3/31/2019 Selected Topics of VLSI Design 8

● A positive integer number A has a range of: 0 V 𝐴 𝑟

● Real numbers contain n digits for the integer part and m digits for the fractional part

● A positive real number A has a range of: 0 V 𝐴 𝑟

● Real number A with n+m digits Value V(A)

Page 9: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● In computation two formats for the approximation of real number exist

1.1 Positional / Place-Value Notation of Numbers

3/31/2019 Selected Topics of VLSI Design 9

● Fixed point numbers the number of significant digits before and after the decimal point is fixed (as seen above)o decimal point is fixed and never explicitly represented in hardwareo its position is defined during design and must be known to interpret

the number● Floating point numbers The number of significant digits before

and after the decimal point depends on exponent

𝑺 𝒆𝒙𝒑𝒐𝒏𝒆𝒏𝒕

● Floating point numbers in modern computer IEEE 754 standardo Binary half precision 16 bit data words(= 1 + 10 + 5 bit)o Binary single precision 32 bit data words(= 1 + 23 + 8 bit)o Binary double precision 64 bit data words(= 1 + 52 + 11 bit)o …

Page 10: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Radix Selection (cont’d from fixed point numbers)o Computations are performed in circuitso Binary representation (r = 2) is best representation for physical

signal levels in most logic Voltage U: { 0 , 1 } { VSS , VDD } Current I : { 0 , 1 } { IMIN , IMAX }

● Efficiency: How many bits do we need in a bit-oriented (r = 2) memory to store a positive number V ?

1.1 Positional / Place-Value Notation of Numbers

31.03.2019 Selected Topics of VLSI Design 10

Page 11: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Positional notation as discussed above only covers positive numbers● For negative number different signed number representations (SNRs)

options exist ● SNR #1: Sign Magnitude (SM)

o Insert sign bit at an-1 before magnitude of numbero Positive number an-1 = 0 and Negative number an-1 = 1

1.2 Signed Number Representations

3/31/2019 Selected Topics of VLSI Design 11

𝐴 0 𝑎 … 𝑎 𝑎

𝐴 𝑟 1 𝑎 … 𝑎 𝑎

● A signed integer number A has a range of:𝑟 V 𝐴 𝑟

● + Symmetrical range● - Double representation of zero, requires different treatment of positive

and negative numbers in arithmetic circuits

Page 12: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● SNR #2: (𝒓-1)-complement 1‘s complemento Negative number results from complementing each digit 𝑎

according to: 𝑎 𝑟 1 𝑎o In a binary representation (𝑟 2) this procedure equals a bitwise

inversion („bit flipping“): 01010101 10101010

1.2 Signed Number Representations

31.03.2019 Selected Topics of VLSI Design 12

𝐴 0 𝑎 … 𝑎 𝑎 𝐴 𝐴 𝑟 1 𝑎 … 𝑎 𝑎

● A signed integer number A has a range of: 𝑟 V 𝐴 𝑟

● Same pros and cons as SM

Page 13: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● SNR #3: 𝒓- complement 2‘s complemento Start with (𝑟 -1)-complement and add 1 to the Least Significant

Digit (LSD)o Binary format (r = 2) most commonly used in digital circuits

1.2 Signed Number Representations

31.03.2019 Selected Topics of VLSI Design 13

𝐴 0 𝑎 … 𝑎 𝑎 𝐴 𝐴 𝑟 1 𝑎 … 𝑎 𝑎 1

● A signed integer number A has a range of: 𝑟 V 𝐴 𝑟

● + Identical treatment of positive and negative numbers in arithmetic circuits, e.g., adders; unique representation of zero

● - Asymmetrical range

Page 14: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● SNR #4: Redundant Representations (RR)o Allow multiple (redundant) representations for the same number

values V(A)o Also true for SM and (𝑟 -1)-complement due to double zero

representation, but typically RR means the following:

● RR #1: Signed Digit Representation (SD)o In SD numbers each digit has its own sign one extra bit per digit

1.2 Signed Number Representations

31.03.2019 Selected Topics of VLSI Design 14

o α and β must cover at least half of the interval defined by the radix

Page 15: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● SD number system is symmetrical for 𝛼 = 𝛽, else asymmetrical ● Maximum or minimum redundancy for symmetrical SD number system:

o Maximum redundancy 𝛼 𝑟 1

o Minimum redundancy 𝛼

● Examples:

1.2 Signed Number Representations

3/31/2019 Selected Topics of VLSI Design 15

Radix r Digit values ai

2 {-1, 0, 1}

3{-2, -1, 0, 1}{-1, 0, 1, 2}{-2, -1, 0, 1, 2}

4{-2, -1, 0, 1, 2} minimum redundancy{-3, -2, -1, 0, 1} not allowed! α, β bounds violated{-3, -2, -1, 0, 1, 2, 3} maximum redundancy

● Only SD numbers with 𝑟 = 2 (redundant binary (RB) numbers) are considered in the following sections 𝑎 ∈ 1, 0, 1

Page 16: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● A SD number with 𝑛 digits of 𝑎 ∈ 1, 0, 1 has 3 different representations, but only 2 1 different values can be representedo Example: 3 011 101 111

● Question: Which RB representation contains the smallest amount of non-zeros (‘1‘ or ‘-1’)?o Answer: use arithmetic conversions of non-zero bit-strings

Example: … 001111 … 111000 … … 010000 … 001000 …

1.2 Signed Number Representations

3/31/2019 Selected Topics of VLSI Design 16

𝑉 𝐴 2 2 ⋯ 2 2 2 2

𝑉 𝐴 2 2 ⋯ 2 2 2

● Such RBRs are called Canonical Signed Digits (CSD) and the conversion strategy is CSD-Recoding

Page 17: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Definition: A CSD recoded number is an 𝑛 digit SD number that has a minimum amount of non-zeros (‘1’ and ‘-1’) and no adjacent non-zero digits

1.2 Signed Number Representations

3/31/2019 Selected Topics of VLSI Design 17

𝒂𝒊

𝒏 𝟏

𝒊 𝟎

≝ 𝒎𝒊𝒏 𝑤𝑖𝑡ℎ 𝒂𝒊 · 𝒂𝒊 𝟏 ≝ 𝟎 𝑓𝑜𝑟 1 𝑖 𝑛 1

● CSD-Recoding operates as iterative and sequential algorithm. Step by step the number is parsed from the least to the most significant digit/bit (“right to left”) to detect strings of adjacent non-zeros, which are converted immediately. The algorithm terminates if the formulated condition of 𝑎 · 𝑎 ≝ 0 is met!

36610 = 0001 0110 1110= 0001 0111 0010= 0001 1001 0010

CSD = 0010 1001 0010

-21310 = 1111 0010 1011= 1111 0010 1101= 1111 0011 0101= 1111 0101 0101

CSD = 0001 0101 0101

Page 18: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Lookup-table for CSD-Recodingo possible 1 or 1 carries from lower positions must be considered

(𝒄𝒊 𝟏 𝒄𝒊 in next step)

1.2 Signed Number Representations

3/31/2019 Selected Topics of VLSI Design 18

Binary Number CSD recoded SD𝒂𝒊 𝟏 𝒂𝒊 𝒄𝒊 𝒂𝒊

∗ 𝒄𝒊 𝟏 Comment0 0 0 0 0 String of zeros0 1 0 1 0 Singular non-zero1 0 0 0 0 String of zeros1 1 0 1 1 Begin of non-zero string0 0 1 1 0 End of non-zero string0 1 1 0 1 String of non-zeros1 0 1 1 1 Singular zero1 1 1 0 1 String of non-zeros

o CSD recoding yields minimum | average | maximum minimum # of non-zeros: 0 𝑡𝑟𝑖𝑣𝑖𝑎𝑙! ∼

Page 19: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Number dependent variable timing and sequential nature of CSD-recoding prohibit its efficient application at run-time. But it is excellent for the recoding of constant values or coefficients at design-time. o Each eliminated non-zero saves hardware and speeds up specific

arithmetic circuits (i.e. multipliers)

● Alternatively, parallel algorithms Booth and modified Booth will work faster for non-zero recoding at run-time.o However, Booth algorithm does not find the minimal form in each

case (see example)o Isolated non-zeros “010“ are not considered by this version

1.2 Signed Number Representations

3/31/2019 Selected Topics of VLSI Design 19

Binary Number Booth recoded SD (a-1 = 0)𝒂𝒊 𝒂𝒊 𝟏 𝒂𝒊

∗ Comment0 0 0 String of zeros1 1 0 String of non-zeros1 0 1 Begin of non-zero string0 1 1 End of non-zero string

Page 20: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Modified Booth improves on standard Booth algorithm by overlapped bit scanning of 3 bit strings

o By considering isolated non-zeros “010“ the maximum amount of non-zeros after conversion is n/2 (for even numbers of n)

1.2 Signed Number Representations

3/31/2019 Selected Topics of VLSI Design 20

Binary NumberModified Booth recoded SD ( i = 1,3,5,… )

r = 2 r = 4Comment

𝒂𝒊 𝒂𝒊 𝟏 𝒂𝒊 𝟐 𝒂𝒊∗ 𝒂𝒊 𝟏

0 0 0 0 0 0 String of zeros0 1 0 0 1 1 Single non-zero1 0 0 1 0 -2 Begin of non-zero string1 1 0 0 1 -1 Begin of non-zero string0 0 1 0 1 1 End of non-zero string0 1 1 1 0 2 End of non-zero string1 0 1 0 1 -1 Single zero1 1 1 0 0 0 String of non-zeros

Page 21: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Example: Modified Booth Recoding for a 12 digit numbero Result is no CSD, but acceptable!

1.2 Signed Number Representations

3/31/2019 Selected Topics of VLSI Design 21

1110110100 0 00=33610

r = 2

r = 4

0 0 0 11 00 1 00 1 0

0 1 2 1 0 2

n = 12

● Comparison of recoding methods

Method Algorithm# of non-zeros

CommentMin Average Max

CSD Sequential 0 ~𝑛3

𝑛 12

Yields minimum # of non-zeros for constant values at design-time

Booth Parallel 0 ? ~ 𝑛 1’s Complement to Signed Digit

ModifiedBooth Parallel 0 ? ~ 𝑛 1

2for run-time recoding (multiplier) potential to save half of the chip area

Page 22: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Conversion from 2‘s complement to SD numbers (Ar ASD)

𝐴𝑟 𝑎 𝑎 … 𝑎 𝑎 𝐴𝑆𝐷 𝑎 𝑎 … 𝑎 𝑎

o Fast: can be done in parallel within one gate delayo 510

01012 0101𝑆𝐷2

o 510 10112 1011𝑆𝐷2

1.2 Signed Number Representations

3/31/2019 Selected Topics of VLSI Design 22

● Conversion from SD numbers to 2‘s complement (ASD Ar)o Split SD number into positive and negative fraction ASD D+ and D-

o -1310 = 010111SD D+ = 000101 and D- = 010010o 2‘s complement number Ar = D+- D-

o Slow: requires one n-bit addition

Page 23: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Conversion from 2‘s complement to SD numbers (Ar ASD)?o For positive numbers ASD = Ar

o For negative numbers Not this easy!o A general method Booth-Recoding! ASD = fBooth(Ar)

1.2 Signed Number Representations

3/31/2019 Selected Topics of VLSI Design 23

1

(0)1010

1 11

510 =2's Complement

Signed Digits

● Conversion from SD numbers to 2‘s Complement (ASD Ar)?o Split SD number into positive and negative fraction ASD D- & D+

o -1310 = 010111SD D- = 010010 & D+ = 000101o 2‘s Complement number Ar = D+- D-

o ASD Ar conversion requires run-time of an adder circuit! Slow!

Page 24: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● SD numbers in binary hardwareo Three possible values per digit 𝑎 ∈ 1, 0, 1 require two bit for

each digit Hardware costs (wires, registers, ALUs) doubleo Two bit per digit allow four different encodings, but two of them are

typically used Sign Value & Negative Positive

1.2 Signed Number Representations

3/31/2019 Selected Topics of VLSI Design 24

Sign Value (SV) Negative Positive (NP)ai S V N P-1 1 1 1 00 0 0 0 01 0 1 0 1

comment intuitive because of its Sign Magnitude representation

easier ASDAr conversion as D+ = Pn-1….P0 and D- = Nn-1…N0

Page 25: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● RR #2: Carry-Save Representation (CS)o Carry-Save numbers originate from hardware structures of full

adders (FAs) and half adders (HAs)

1.2 Signed Number Representations

3/31/2019 Selected Topics of VLSI Design 25

o Digit 𝑎 represents a tuple: 𝑎 𝑠 𝑐 2 · 𝑐 𝑠o CS numbers are stored as combination of a carry- and intermediate

sum-vector

Page 26: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Additions with CS number only have a critical path of one half adder, but require 2 bit per digit storage and communication (wires)

1.2 Signed Number Representations

3/31/2019 Selected Topics of VLSI Design 26

● 𝑉 𝐴 𝑐 𝑐 𝑐 … . 𝑐 𝑐

𝑠 𝑠 … . 𝑠 𝑠

● In general, there is no difference between CS and SD numberso CS numbers result from the outputs of a half adder (HA)o SD numbers have their origin in theory of number representations

Page 27: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Why should RRs be applied or when is it worth to use them?

1.2 Signed Number Representations

3/31/2019 Selected Topics of VLSI Design 27

Pros Cons- Carry-free and thus faster addition /

subtraction (see adder section)- Arithmetic algorithms based on adders

(nearly all) benefit from this

- More resources- Comparison operations (≥, ≤, <, =, >)

are slow due to ASDAr conversion- ASDAr conversion slow due to adder

Ar ASD

Operation 1…

Operation k

ASD Ar

T ~ O(1)

Top,i ≠ f( )Carry-free operations!

Tadd ~ O(log2( ))

Page 28: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Rounding trims numbers into formats with fewer digits o Examples

Two n bit numbers are multiplied and the result will be a number with 2n bits, but hardware only captures m < 2n bits

Rounding after right shift by one digit of an integer

● Rounding methods can be classified as follows:o Accuracy of the final results (or information loss by rounding)o Numerical error characteristics of the rounding methodo Cost/effort/delay to perform the rounding

● Assumeo Given: 𝐴 𝑎 𝑎 … 𝑎 𝑎 . 𝑎 … 𝑎 Cut 𝑑 bits o Rounded: 𝐵 𝑏 𝑏 … 𝑏 𝑏 𝐴 𝜀 ⇒ 𝜀 𝐵 𝐴o Goal: Minimize rounding error 𝜀

1.3 Rounding

31.03.2019 Selected Topics of VLSI Design 28

Page 29: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Rounding Method #1: Truncationo Step 1: 𝑑 least significant bits are cut off from 𝐴o Rounding result 𝐵 𝑎 𝑎 … 𝑎 𝑎o Minimum error 𝜀 0.000002

o Maximum error 𝜀 1 2 0.111112

o Average error 𝜀 0.1000012

o Asymmetrical bias

1.3 Rounding

3/31/2019 Selected Topics of VLSI Design 29

Position –(𝑑+1)

A

B

1 2 3 4 5

1

2

3

4

Page 30: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Rounding Method #2: Round-to-Nearesto Step 1: Addition of 0.510 to 𝐴 ⇒ 𝐴 𝐴 0.5 𝐴 0.1o Step 2: 𝑑 least significant bits are cut off from 𝐴 to fit 𝐵o Resulting effect is an alternate rounding to higher & lower numberso Rounding result 𝐵 𝑎 𝑎 … 𝑎 𝑎o Minimum error 𝜀 0.00000 (for A=0.0 B=0.0)o Maximum error 𝜀 2 0.1 (for A=0.1 B=1.0)

o Average error 𝜀 2 0.01

o Smaller asymmetrical bias (due to always rounding up of A=0.1)

1.3 Rounding

31.03.2019 Selected Topics of VLSI Design 30

A

B

can be often incorporated effortlessly into previous operation

1 2 3 4 5

1

2

3

4

Page 31: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Rounding Method #3: Round-to-Nearest-Eveno Step 1: Addition of 0.510 to 𝐴 ⇒ 𝐴 𝐴 0.5 𝐴 0.1o Step 2: 𝑑 least significant bits of 𝐴 are zero cut off from 𝐴 to fit 𝐵 and

set 𝑎 to zero, otherwise proceed with Round-to-Nearesto Yields average bias of zero!

o 𝐵 ,𝐵 𝑖𝑓 𝑎 … 𝑎 0.000 …

𝑎 𝑎 … 𝑎 0 𝑒𝑙𝑠𝑒

o 𝑏𝑖𝑎𝑠 0o Symmetrical error and bias-free, mandatory in IEEE Floating Point

1.3 Rounding

31.03.2019 Selected Topics of VLSI Design 31

Idea: alternate rounding up and down to nearest even number

1 2 3 4 5

1

2

3

4

Page 32: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Overflow occurs if numbers exceed available word length in datapaths

1.4 Overflow

31.03.2019 Selected Topics of VLSI Design 32

000.

..0

111.

..1

011.

..110

0...0

-2 n-1 2 n-1 2 n0

unsigned

2´s complement

1´s complement

sign magnitude

Page 33: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Overflow in 2‘s complement numberso range 2 V 𝐴 2o Overflow in addition of two numbers

Reason: Carry out from sign digit is discarded Case 1: Two positive summands A and B negative sum S

𝑎 ∧ 𝑏 ∧ 𝑠 ⇒ 𝑐 1, 𝑐 0

Case 2: Two negative summands A and B positive sum S 𝑎 ∧ 𝑏 ∧ 𝑠 ⇒ 𝑐 0, 𝑐 1

In general, overflow occurs for 𝑐 𝑐 at sign digit (for add & sub)

1.4 Overflow

31.03.2019 Selected Topics of VLSI Design 33

FA

an-1 bn-1

sn-1

cout cin

Saturation Logic

overflows*n-1

● In non-redundant number systems overflows are definitely detectable

● Possible actions after overflow detectiono Emergency stopo Error handlingo Saturation to maximum (01111) or minimum

(10000) number

Page 34: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Overflow in Carry-Save representationso In redundant number systems two types of overflow exist

True and pseudo overflowo Example: 0.510 + (-0.510) + 0 = 0 !!!

1.4 Overflow

31.03.2019 Selected Topics of VLSI Design 34

-20 2-1 2-2

0 1 0 0.510

1 1 0 -0.510

0 0 0 0

0 1 0 carry vector = -110

1 0 0 sum vector = -110

o Wrong intermediate result -210 in CS representation would yield correct value 0 if converted to 2’s complement via vector merging addition (VMA) of carry and sum vector

o Test: 1.00 + 1.00 = 10.00 (dropped) Result = 0.00 = 010

Page 35: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

o Wrong results possible if other operations are executed on intermediate carry and sum vector

o Example: (0.510 + (-0.510) + 0) ∙ 0.510 = 0 Multiplication with 0.5 equals right shift with sign extension of carry and 

sum vector Carry: 1.00 : 2  1.10  = - 0,510 Sum: 1.00 : 2  1.10  = - 0,510

------ VMA:  11.00  = - 110 ≠ 010

o Error becomes obvious after conversion to non-redundant number. However, correct result 0 would fit into given word length

o Those pseudo overflows are detectable and correctable as follows

1.4 Overflow

31.03.2019 Selected Topics of VLSI Design 35

Page 36: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

o Pseudo overflow correction for CS numbers: Given: 𝑐 𝑐 . 𝑐 𝑐 …

𝑠 . 𝑠 𝑠 … 𝑠 Modify to: 𝒄𝟎. 𝑐 𝑐 …

𝒔𝟎. 𝑠 𝑠 …

using 𝑐 𝑐 and 𝑠 𝑐 𝑖𝑓 𝑐 𝑐𝑠 𝑒𝑙𝑠𝑒 𝑠 𝑠 ⨂𝑐 ⨂𝑐

XOR gates can be easily integrated as part of the MSD/MSB adder circuit at low hardware overhead without speed penalty!

o Method works as long as the converted 2‘s complement result fits into the given word length

o Example with pseudo overflow correction:

1.4 Overflow

31.03.2019 Selected Topics of VLSI Design 36

Page 37: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● In general, a reduction of leading digits of CS numbers can be achievedo provided that CS number fits into corresponding 2’s complement

number according to condition -1 ≤ C + S ≤ (1-2-(n-1))● as follows:

Given: 𝑐 … . 𝑐1𝑐 . 𝑐 𝑐 … 𝑠𝑛 … 𝑠 𝑠 . 𝑠 𝑠 …

Modify to: 𝒄𝟎. 𝑐 𝑐 …𝒔𝟎. 𝑠 𝑠 …

using 𝑠 𝑠 𝑖𝑓 𝑠 𝑐𝑠 𝑒𝑙𝑠𝑒 𝑐 𝑐 𝑖𝑓 𝑠 𝑐

𝑐 𝑒𝑙𝑠𝑒

● Pseudo overflow correction needs less digits and chip area than uncorrected formats

1.4 Overflow

31.03.2019 Selected Topics of VLSI Design 37

Page 38: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Overflow in SD numberso Similar to the CS case Pseudo and real overflowso Thereby, the overflow behavior depends on the MSD sum bit 𝑠

and the intermediate carry bit 𝑑o Analysis for possible correction of 𝑠 as follows:

1.4 Overflow

31.03.2019 Selected Topics of VLSI Design 38

𝑑 𝑠 Overflow Type 𝒔𝒏 𝟏

1 N pseudo 11 0 potential1 1 realN N realN 0 potentialN 1 pseudo N0 X none 𝑠

o Pseudo overflow correctable at MSD without performance impact

o Real overflow must be avoided through modification at the system or algorithm level

o Potential overflow would require an inspection of all lower digits Hardware costs increase

o Potential overflow avoidable via range limitation to 2

Page 39: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● General options/mechanisms for handling of real overflowo Analytical analysis to identify minimum/maximum intermediate and

final valueso Corner case simulation of the system to check for sufficient word

lengths for any occurring valueso Thus estimate lower bound on word lengtho For insufficient word lengths or if too expensive:

Reduce accuracy less bits after decimal point Test whether application allows saturation Detect real overflow and handle it

1.4 Overflow

31.03.2019 Selected Topics of VLSI Design 39

Page 40: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Wrap up of some basic operations on data and numbers

1.5 Basic Operations

31.03.2019 Selected Topics of VLSI Design 40

Operation

Shiftunsigned

left 𝒂𝒏 𝟐 … 𝒂𝟏𝒂𝟎𝟎right 𝟎𝒂𝒏 𝟏𝒂𝒏 𝟐 … 𝒂𝟏

signed2‘s complement

left 𝒂𝒏 𝟏𝒂𝒏 𝟑 … 𝒂𝟎𝟎right 𝒂𝒏 𝟏𝒂𝒏 𝟏 … 𝒂𝟏

Rotateleft 𝒂𝒏 𝟐 … 𝒂𝟏𝒂𝟎𝒂𝒏 𝟏

right 𝒂𝟎𝒂𝒏 𝟏𝒂𝒏 𝟐 … 𝒂𝟏

Extendunsigned

left 𝟎𝒂𝒏 𝟏𝒂𝒏 𝟐 … 𝒂𝟏𝒂𝟎

right 𝒂𝒏 𝟏𝒂𝒏 𝟐 … 𝒂𝟏𝒂𝟎𝟎

signed2‘s complement

left 𝒂𝒏 𝟏𝒂𝒏 𝟏𝒂𝒏 𝟐 … 𝒂𝟏𝒂𝟎

right 𝒂𝒏 𝟏𝒂𝒏 𝟐 … 𝒂𝟏𝒂𝟎𝟎

Saturateunsigned 𝒂𝒏 𝟏 … 𝒂𝒏 𝟏𝒂𝒏 𝟏

signed 2‘s complement 𝒂𝒏 𝟏𝒂𝒏 𝟏 … 𝒂𝒏 𝟏

Page 41: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Some „Rule of Thumb“ estimations for delay and area of typical functions and algorithm structures in arithmetic circuitso Naming conventions:

𝐴 Area 𝑇 Cycle time/delay 𝐿 Latency # Number of cycles

o Basic assumption for gates: Inverter, Buffer 𝐴 0 , 𝑇 0 (negligible) Simple 2-Input gate 𝐴 1 , 𝑇 1 (AND, NAND, OR, NOR) Special 2-Input gate 𝐴 2 , 𝑇 2 (XOR, XNOR) Complex m-Input gate 𝐴 𝑚 1 , 𝑇 𝑙𝑜𝑔 𝑚 (gate tree) Wiring costs as well as area not considered (high abstraction)

o Basic assumptions for circuit function: Up to 𝑛 inputs 𝑎 𝑎 , 𝑎 , … , 𝑎 , 𝑎 Up to 𝑛 outputs 𝑧 𝑧 , 𝑧 , … , 𝑧 , 𝑧 Blue dots represent functions that generate outputs

𝑧 𝑓 𝑎 , 𝑎 , … , 𝑎 , 𝑎

1.6 Cost/Performance Estimation Basics

31.03.2019 Selected Topics of VLSI Design 41

Page 42: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

o Non-recursive functions 𝑧 𝑓 𝑎 , 𝑥 𝑤𝑖𝑡ℎ 𝑖 0, 1, … , 𝑛 1 𝑎𝑛𝑑 𝑥 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 output 𝑧 only depends on input 𝑎 can be implemented as fully parallel hardware structure 𝐴 𝑂 𝑛 and 𝑇 𝑂 1

1.6 Cost/Performance Estimation Basics

31.03.2019 Selected Topics of VLSI Design 42

Page 43: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

o Recursive functions with single output Output depends on all inputs 𝑧 𝑓 𝑎 , 𝑎 , … , 𝑎 , 𝑎 Case 1: 𝑓 non-associative 𝐴 𝑂 𝑛 and 𝑇 𝑂 𝑛 (serial structure) Case 2: 𝑓 associative 𝐴 𝑂 𝑛 and 𝑇 𝑂 𝑙𝑜𝑔 𝑛 (tree structure)

1.6 Cost/Performance Estimation Basics

3/31/2019 Selected Topics of VLSI Design 43

an-1

zn-1

an-2 ... a1 a0

a3

z3

a2 a1 a0

Case 1: non-associative Case 2: associative

Page 44: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

o Recursive functions with multiple outputs Prefix problem 𝑧 𝑓 𝑎 , 𝑧 Case 1: f non-associate 𝐴 𝑂 𝑛 and 𝑇 𝑂 𝑛 (serial) Case 2: f associative 𝐴 𝑂 𝑛 and 𝑇 𝑂 𝑙𝑜𝑔 𝑛 (multi tree / serial) Case 3: f associative 𝐴 𝑂 𝑛 ⋅ 𝑙𝑜𝑔 𝑛 and 𝑇 𝑂 𝑙𝑜𝑔 𝑛 (shared)

1.6 Cost/Performance Estimation Basics

3/31/2019 Selected Topics of VLSI Design 44

Case 1: non-associative Case 2: associativean-1

zn-1

an-2

zn-2

...

...

a1

z1

a0

z0

a3

z3

a2

z2

a1

z1

a0

z0

Case 3: associative

a3

z3

a2

z2

a1

z1

a0

z0

inparallel

Page 45: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

Institute ofApplied Microelectronics & Computer Engineering

Selected Topics of VLSI Design

Part 2: Adders

Prof. Dr.-Ing. Dirk [email protected]

Page 46: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● 2.1 Fundamentalso Half Adder, Full Adder, (m,k)-Counter

● 2.2 Carry Propagate Adderso Ripple Carry, Carry Skip, Carry Select, Conditional Sum, Carry Lookahead,

Asynchronous

● 2.3 Non-Carry Adderso Carry Save, Redundant Binary

● 2.4 Multi-Operand Adderso Matrix Adder, (m:2)-compressor, Adder Trees

● 2.5 Sequential Adderso LSB-first, MSB-first, Accumulator

● 2.6 Add-based Operations

Outline

6/3/2019 Selected Topics of VLSI Design 2

Page 47: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● 1 Bit Adder or (𝑚,𝑘)-countero Counting 𝑚 1-bit numbers of same magnitudeo Result: 𝑘 -bit sum,

● Half Adder or (2,2)-counter

2.1 Fundamentals of Adders

6/3/2019 3

a b

scouts

ab

cout

𝑎 𝑏 2 𝑐 𝑠

Sum: 𝑠 𝑎 ⊕ 𝑏

Carry: 𝑐 𝑎 ∧ 𝑏

Selected Topics of VLSI Design

Metric:𝐴 3𝑇 1𝑇 2

Example:1 1 1 111 1 1 1 100𝑘 𝑙𝑜𝑔 𝑚 1

Page 48: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Full Adder or (3,2)-counter

Popular variables:

2.1 Fundamentals of Adders

6/3/2019 4

𝑔 𝑎 ∧ 𝑏 ; generate cout

𝑝 𝑎 ⊕ 𝑏 ; propagate cin

𝐶 𝑎 ∧ 𝑏

𝑎 𝑏 𝑐 2 𝑐 𝑠

Selected Topics of VLSI Design

Composed of Half Adders

𝐶 𝑎 ∨ 𝑏

𝑠 𝑝 ⊕ 𝑐𝑖𝑛

Page 49: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Full Adder

2.1 Fundamentals of Adders

6/3/2019 5

s

a b

coutcin

Selected Topics of VLSI Design

𝑐 𝑎 ∧ 𝑏 ∨ 𝑎 ∧ 𝑐 ∨ 𝑏 ∧ 𝑐 𝑐 𝑔 ∨ 𝑝 ∧ 𝑐

𝑠 𝑝 ⊕ 𝑐

Different ways to calculate s and coutOptimal structure depends on technology

a

pg

cout cin

s

b

𝑠 𝑝 ⊕ 𝑐

Page 50: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Full Adder

2.1 Fundamentals of Adders

6/3/2019 6

0

cout

1

s

cin

p

a bb

cin

s

a

0

1 c1

c0

cout

Selected Topics of VLSI Design

Metric:𝐴 7𝑇 2𝑇 4

𝑠 𝑝 ⊕ 𝑐 𝐶1 ∧ 𝐶0 ⊕ 𝑐o Mux: 2 Transmission Gateso Transmission Gate:

𝑐 𝑐 ∧ 𝐶 ∨ (𝑐 ∧ 𝐶 )

𝑐 𝑐 ∧ 𝑝 ∨ (𝑎 ∧ 𝑝

Page 51: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● (m,k)-countero Addition of m bits

o Composed of full adderso Addition is associative: linear structure tree structure o Reduced critical path

2.1 Fundamentals of Adders

6/3/2019 7

( m, k )

s0

a0 a1 am-1

sk-1

......

Selected Topics of VLSI Design

Page 52: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Example: (7,3)-counter

2.1 Fundamentals of Adders

6/3/2019 8

Linear structure Tree-structure

Selected Topics of VLSI Design

Metric:𝐴 28𝑇 10

Metric:𝐴 28𝑇 14

Page 53: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Addition of 2 n-bit operands A, B and optional cin using carry propagation

● Sum: non-redundant (n+1)-bit number

● Different methods of carry propagation:o Ripple Carry Adder (RCA)o Carry Skip Adder (CSkA)o Carry Select Adder (CSel)o Conditional Sum Adder (CSum)o Carry Lookahead Adder (CLA)

2.2 Carry Propagate Adders (CPA)

9

𝐴 𝐵 𝑐 𝑆 2 ⋅ 𝑐

𝑎 𝑏 𝑐 𝑠 2 ⋅ 𝑐 for i 0,1…,n-1

𝑐 𝑐 𝑐 𝑐

CPAcout

A B

S

cin

Selected Topics of VLSI Design6/3/2019

Page 54: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Serial arrangement of full adders (FA)● Simplest, smallest and slowest CPA

● Carry speed-up strategy for CPAs:o Type A: Partitioning in groups of shorter

CPAs (with fast cin cout)o Type B: Parallelization using tree structure

2.2.1 Ripple Carry Adder (RCA)

6/3/2019 10Selected Topics of VLSI Design

Metric:𝐴 7𝑛𝑇 2𝑛

0 0

in

1 1

01

n‐1 n‐1

n‐1

out

Page 55: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Type A● Idea: determine for each group in parallel whether

a) cin generates cout -or-b) cin can skip this group (“skip + propagate”)

2.2.2 Carry Skip Adder (CSkA)

6/3/2019 11

CPACPACPA

an-1: j

s k-1: 0

bn-1 : j bi-1: kai-1: k a k-1: 0 b k-1: 0

si-1: ksn-1: jPi-1 : k

cout cincj ci ck

c'i0

1

...

...

k‐bit group

𝑐 𝑃 : ∧ 𝑐 ∨ 𝑃 : ∧ 𝑐

Selected Topics of VLSI Design

Page 56: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

2.2.2 Carry Skip Adder (CSkA)

6/3/2019 12Selected Topics of VLSI Design

𝑃 : 𝑝 ∧ 𝑝 ∧ ⋯ ∧ 𝑝

𝑝 𝑎 ⊕ 𝑏

Group propagate

Bit propagate

Requires k-input AND-gate for each group

● Critical path in a given group = k bits

Function of 𝑃 : :

𝑃 : 0 ⇒ 𝑐 doesn t affect 𝑐

𝑃 : 1 ⇒ 𝑐 determines 𝑐𝑖

propagate 𝑐𝑖′ to mux output

𝑐𝑘 skips group and is propagated to mux input

Page 57: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Question: Which group size is optimal w.r.t. delay?● Assumptions:

o fixed k for all groups n/k groups of same size

2.2.2 Carry Skip Adder (CSkA)

6/3/2019 13Selected Topics of VLSI Design

𝑇 𝑘 ∗ 𝑇𝑛𝑘 2 ∗ 𝑇 𝑘 ∗ 𝑇

2 ∗ 𝑘 ∗ 𝑇 𝑛 ∗ 𝑘 2 ∗ 𝑇

4 ∗ 𝑘 𝑛 ∗ 𝑘 2

TMux = 1TCarry = 2

Page 58: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

2.2.2 Carry Skip Adder (CSkA)

6/3/2019 14Selected Topics of VLSI Design

𝑇 , 2 𝑛 𝑛12 𝑛 4

2 𝑛 2 𝑛 4

𝑻𝑪𝑺𝒌𝑨,𝒐𝒑𝒕 4 𝑛 4 𝐎 𝒏

● Further improvementso Faster CPAs, e.g. multi-staged CSkAo Variable group size, overlap TCPA + Tmux with TCPA of next group

larger middle groups. Note that sum time in last group depends on its group size and can only start after all preceding operation have finished for n=32 bit choose 1,2,3,4,5,6,5,3,2,1

o Cost compared to RCA: 1 XOR/bit, (1 AND + 1 Mux)/group

𝑇 𝑘 4 𝑛 ∗ 𝑘 0

⇒ 𝑘12 𝑛

Metric:𝐴 8𝑛𝑇 4 𝑛

Page 59: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Type A● Idea:

a) Compute cout and sout for both possible cin in each k-bit groupb) Actual cin selects corresponding output and propagates result

2.2.3 Carry Select Adder (CSel)

6/3/2019 15Selected Topics of VLSI Design

k/n‐bit adder

cin

Sk‐1:0ci

k/n‐bit adderk/n‐bit 

adder

Si‐1:k

10

1 0ck

ak‐1:0 bk‐1:0

𝑠 : 𝑐 ∧ 𝑠 : ∨ 𝑐 ∧ 𝑠 :

𝑐 𝑐 ∧ 𝑐 ∨ 𝑐 ∧ 𝑐

Page 60: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

2.2.3 Carry Select Adder (CSel)

6/3/2019 16Selected Topics of VLSI Design

● Optimal group size (like CSkA):

● Further improvements:o Faster CPA, e.g. multi-staged CSelo Variable group size k

Overlapping TCPA and TMux with TCPA of next group Increase group size by one bit / group from LSB to MSB

e.g., for 28 bit: 7,6,5,4,3,2,1

● Cost compared to RCAo 1 “sum-Mux”/bit + (CPA+”carry-Mux”)/groupo Note: no duplication of whole CPA

A ⊕ B can be reused -or- Use Binary-to-Excess-1 Code (BEC) Converter for

simpler block with cin =1

Metric:𝐴 14𝑛𝑇 3 𝑛

𝑘12 𝑛

𝑻𝑪𝑺𝒆𝒍 𝐎 𝒏

Page 61: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

2.2.3 Carry Select Adder with BEC (extra stuff)

6/3/2019 17Selected Topics of VLSI Design

● Binary-to-Excess-1 Code (BEC)for 4 bit, e.g., realizes increment by one

● Simple Generation

● Replace block with cin =1, requiresless area than standard structure

Sum output of block with cin =0

Page 62: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Hybrid Type A (similar to CSel but 1-bit groups) and Type B (tree)● Parallel propagation of 1-bit groups using tree-structure (instead of

sequential propagation of carries of k-bit groups)● Fastest and most costly CPA exploiting max parallelism

o n-summand bits are propagated by mux tree depth: 𝑙𝑜𝑔 𝑛

o Cost: 2 ∗ 𝑅𝐶𝐴 2 ∗ 𝑙𝑜𝑔 𝑛 Mux/bit

2.2.4 Conditional Sum Adder (CSum)

6/3/2019 18Selected Topics of VLSI Design

Metric:𝐴 3𝑛 · 𝑙𝑜𝑔 𝑛 O n · 𝑙𝑜𝑔 𝑛𝑇 2 · 𝑙𝑜𝑔 𝑛 O 𝑙𝑜𝑔 𝑛

Page 63: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Type B:o parallel tree structureo all carries are pre-computedo if too expensive for large n partitioning into k-bit groupso hierarchical arrangement in ½ log 𝑛 stages

● Implementationso Kogge-Stone (1973): fast, long wires, irregular layouto Brent-Kung (1982): much more regular, bit slowero Han-Carlson (1987): compromise between KS and BKo Ling/Sklansky (1981): large fanout to compute higher bitso Ladner-Fischer (1980): compromise between Ling and BK

2.2.5 Carry Lookahead Adder (CLA)

6/3/2019 19Selected Topics of VLSI Design

Metric:𝐴 𝑂 𝑛 · 𝑙𝑜𝑔 𝑛 )𝑇 𝑂 𝑙𝑜𝑔 𝑛 )

Page 64: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Classification

2.2.5 Carry Lookahead Adder (CLA)

6/3/2019 20Selected Topics of VLSI Design

Page 65: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

2.2.5 Carry Lookahead Adder (CLA)

6/3/2019 21Selected Topics of VLSI Design

𝑐 𝑐′𝑐 𝑔 ∨ 𝑝 ∧ 𝑐′𝑐 𝑔 ∨ 𝑝 ∧ 𝑐 𝑔 ∨ 𝑝 ∧ 𝑔 ∨ 𝑝 ∧ 𝑝 ∧ 𝑐′𝑐 𝑔 ∨ 𝑝 ∧ 𝑔 ∨ 𝑝 ∧ 𝑝 ∧ 𝑔 ∨ 𝑝 ∧ 𝑝 ∧ 𝑝 ∧ 𝑐′𝑔′ 𝑔 ∨ 𝑝 ∧ 𝑔 ∨ 𝑝 ∧ 𝑝 ∧ 𝑔 ∨ 𝑝 ∧ 𝑝 ∧ 𝑝 ∧ 𝑔𝑝′ 𝑝 ∧ 𝑝 ∧ 𝑝 ∧ 𝑝…..

Carry Lookahead Block (CLB) c’0

(g0,p0)(gn‐1,pn‐1)      ...

(g‘n‐1,p‘n‐1) block generate & propagate

c0cn‐1             ...

Page 66: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

2.2.5 Carry Lookahead Adder (CLA)

03.06.2019 Selected Topics of VLSI Design 22

Page 67: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Example for 16b additiono Kogge-Stone Han-Carlson

o Brent-Kung

2.2.5 Carry Lookahead Adder (CLA)

03.06.2019 Selected Topics of VLSI Design 23

Page 68: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● a.k.a Carry Completion Adders● detects end of carry propagation and generates carry-completion

signal (indicates validity of sum bits) ● Tcarry,mean = ~ 𝑙𝑜𝑔 𝑛 stages for 𝑛-bit adder

● Pros:o simple RCA with 𝑂 𝑙𝑜𝑔 𝑛o well suited for resource limited architectures/cascaded additions

e.g. crypto hardware on smartcards● Cons:

o only for asynchronous (self-timed) systemso extra hardware for carry completion logic

2.2.6 Asynchronous Adder

6/3/2019 24Selected Topics of VLSI Design

Metric:𝐴 8𝑛𝑇 2𝑙𝑜𝑔 𝑛𝑇 2𝑛

Page 69: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● delay is independent of width

● adds 3 n-bit numbers without carry propagation● carry is saved in Carry-Save representation

● Operands can be o Three 2’s complement (TC) numbers oro One 2’s complement number + one CS-number

2.3 Non-Carry-Propagate Adders

6/3/2019 25Selected Topics of VLSI Design

2.3.1 Carry Save Adder (CSA)

0 1 2n n n

n n

𝑎 , 𝑎 , 𝑎 , 2𝑐 𝑠 ; 𝑖 0 … 𝑛 1

𝐴 𝐴 𝐴 𝐶 𝑆 𝐶, 𝑆

Page 70: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Built out of n full adders

● 3 input vectors are merged into 2 output vectors● also called: (3,2)-compressor

2.3.1 Carry Save Adder (CSA)

6/3/2019 26Selected Topics of VLSI Design

s0s1 c1c2

a0,n‐1 a1,n‐1

sn‐1cn

a2,n‐1 a0,1 a1,1 a2,1 a0,0 a1,0 a2,0

Metric:𝐴 7𝑛𝑇 4

constant!

Page 71: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Summation of numbers in Signed Digit Representation (RBA: base r=2)

2.3.2 Redundant Binary Adder (RBA)

6/3/2019 27Selected Topics of VLSI Design

𝑎 , 𝑏 , 𝑠 , 𝑑 , 𝑧 ∈ 1,0,1

𝑎 𝑏 2𝑑 𝑧

𝑧 𝑑 𝑠

𝑆 𝐴 𝐵

𝑑 ...intermediate carry𝑧 …intermediate sum

Page 72: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

2.3.2 Redundant Binary Adder (RBA)

6/3/2019 28Selected Topics of VLSI Design

𝑎 𝑏 2𝑑 𝑧

𝑧 𝑑 𝑠

𝑑 intermediate carry𝑧 intermediate sum

Similar to carry save

𝑠 𝑧 𝑑 is carry-free iff:ai bi ai-1 bi-1 di+1 zi

1 1 X X 1 01 0 both 0

else1 -1

0 1 0 1ai+bi=0 X X 0 0

0 -1 both 0else

0 -1-1 0 -1 1-1 -1 X X -1 0 [Takagi, 1987]

𝑋 don‘t care𝑎 0𝑏 0

Digit i is only affecting digits i+1 and i+2 no carry propagation

Metric:𝐴 7𝑛𝑇 ≅ 2𝐹𝐴

Page 73: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Examples:

2.3.2 Redundant Binary Adder (RBA)

6/3/2019 29Selected Topics of VLSI Design

01111 𝑎00001 𝑏

011111 𝑑01110 𝑧10000 𝑠

01111 𝑎00111 𝑏

000011 𝑑01000 𝑧01010 𝑠

ai bi ai-1 bi-1 di+1 zi

1 1 X X 1 01 0 Both 0

Else1 -1

0 1 0 1ai+bi=0 X X 0 0

0 -1 Both 0Else

0 -1-1 0 -1 1-1 -1 X X -1 0

00111 𝑎01111 𝑏

010011 𝑑01000 𝑧

11010 𝑠

Page 74: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Comparison: CSA vs. RBA

2.3.2 Redundant Binary Adder (RBA)

6/3/2019 30Selected Topics of VLSI Design

Carry Save Adder Redundant Binary AdderFrom 2‘s C direct conversion, no HW needed

To 2’s C add carry and sum vectors(C+S)

split SD-number in positive (1, 0) and negative (1, 0) subtraction

Functionality

CS-cell = FA = (3,2)-compressoradds CS+2C or 2C+2C+2C,

cascaded cell (4:2)-compressor for CS+CS

RB-cell = (4,2)-celladds RB+RB

Complexity~equal at same functionality

22 transistors (3:2),available in libraries

42 transistors (4:2),availability depends on library

Page 75: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

2.3.2.1 Overflow in Redundant Binary Adders

● There are real overflow situations, but● also pseudo overflow

example:

● Overflow depends on MSD of sum 𝑠 and intermediate carry 𝑑 :

2.3.2 Redundant Binary Adder (RBA)

6/3/2019 31Selected Topics of VLSI Design

111 𝑎 1111 𝑏 1

1111 𝑑 20000 𝑧

110 𝑠 6

𝑑 𝑑 𝑑 … 𝑑𝑠 𝑠 … 𝑠 𝑠

Page 76: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Pseudo overflow prevented by correction rule for sn-1:

● Can be implemented without speed loss in MSD of RBA● Real overflow needs to be handled on system or algorithmic level● Potential overflow

o needs detection on lower bits, oro limit magnitude of all numbers to < 2n-2, oro increase word length by one

2.3.2.1 Overflow in Redundant Binary Adders

6/3/2019 32Selected Topics of VLSI Design

dn sn-1 Overflow s‘n-1

1 -1 Pseudo 11 0 Potential1 1 Real avoid or

use saturation-1 -1 Real

-1 0 Potential-1 1 Pseudo -10 X no sn-1

Page 77: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Summation of 3 or more 𝑛-bit operands ● Result requires non-redundant bits

2.4.1 Multi-operand addition using adder array

a) Linear array of CPAs (example: 4-operand RCA)

2.4 Multi-Operand Adder

6/3/2019 33Selected Topics of VLSI Design

𝑚 3𝑛 𝑙𝑜𝑔 𝑚

FA

a0,n-1

FA

FAFA

a2,n-1

a3,n-1

a1,n-1

sn-1sn

FA

a0,2

FA

FA

a2,2

a3,2

a1,2

s2

FA

a0,1

FA

FA

a2,1

a3,1

a1,1

s1

HA

a0,0

HA

HA

a2,0

a3,0

a1,0

s0

(m-1)-CPAs

CPA 1

CPA 2

CPA 3

Page 78: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

b) Linear array of CSAs and final CPA (example: RCA)

2.4.1 Multi-operand addition using adder array

6/3/2019 34Selected Topics of VLSI Design

Page 79: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

2.4.1 Multi-operand addition using adder array

6/3/2019 35Selected Topics of VLSI Design

● Evaluation:o same delay for a) and b)o buto Type a): fast final CPA (e.g. CSum) has to wait for operand arrival,

delay iso Type b): delayfor high performance always use b), type a) is expensive/useless

● Generic scheme for b):

𝑂 𝑛 𝑚𝑂 𝑚 𝑙𝑜𝑔 𝑛

CSA1

A0

2's C

A1 A2 A3 A4

CSA2

CSA3

CPA

𝐴 𝑚 2 · 𝐴 𝐴

𝑇 𝑚 2 · 𝑇 𝑇

𝐴 𝑂 𝑚 · 𝑛 𝑛 · 𝑙𝑜𝑔 𝑛

𝑇 𝑂 𝑚 𝑙𝑜𝑔 𝑛

For logarithmic CPA:

Page 80: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Idea:o one column (2.4.1 b) without terminating CPA

compresses m input bit to 2 output bit propagates (m-3) carries to left-hand column

● No horizontal carry propagation ● Uses FAs = (3:2)-compressor or (4:2)-cell in linear array or tree-

structureo Example: 4-operand adder with (4:2)-adders

2.4.2 (m:2)-compressors

6/3/2019 36Selected Topics of VLSI Design

𝐴 7 ∗ 𝑚 2

𝑇 4 ∗ 𝑚 2

𝑇 6 ∗ 𝑙𝑜𝑔 𝑚 1

Page 81: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Implementation of (4:2)-adders:

2.4.2 (m:2)-compressors

6/3/2019 37Selected Topics of VLSI Design

FA

FA

a0 a1

a bcincout

s

a2 a3

cin

s

a bcincout

s

cout

c

0 1cout

0 1

C S

cin

a1a0 a2 a3

𝐴 14𝑇 8

𝐴 16𝑇 6

2 full adders: Optimized structure using tree of XOR gates:

Page 82: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Advantages of (4:2) versus (3:2) in (m:2) compressorso 4:2 instead of 3:2 (sic!)o reduced deptho regular layout

● Example: (8:2)-compressor:

2.4.2 (m:2)-compressors

6/3/2019 38Selected Topics of VLSI Design

Page 83: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Using n-bit m-operand redundant adders● Tree-structure● Each adder consists of n-bit (m:2)-compressors● Fastest multi-operand adders:

o adder tree + log2(n)-CPA

2.4.3 Multi-operand addition using adder trees

6/3/2019 39Selected Topics of VLSI Design

𝐴 𝐴 , · 𝑛 𝐴 𝑂 𝑚 · 𝑛 𝑛 · 𝑙𝑜𝑔 𝑛

𝑇 𝑇 , 𝑇 𝑂 𝑙𝑜𝑔 𝑚 𝑙𝑜𝑔 𝑛

Page 84: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Wallace Tree (1964)o Redundant adder = CSA (3:2)

● Trees are faster than arrays with same number of gates● But: trees require irregular wiring

increased area

2.4.3 Multi-operand addition using adder trees

6/3/2019 40Selected Topics of VLSI Design

4:2 4:2 4:2 4:2

4:2

4:2

4:2

3:2 3:2

3:2

3:2

3:2

Wallace Tree: (4:2)-Tree:

Page 85: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Bitwise adding of 2 n-bit numbers, starting from LSB ● Pros:

o Smallo Serial communicationo Cascadable (LSB-In LSB-Out)

● Cons:o Needs temporary storage flipflopo Latency: n cycles

2.5 Sequential Adders

6/3/2019 41Selected Topics of VLSI Design

2.5.1 LSB-first serial adder

𝐴 𝐴 𝐴

𝑇 𝑇 𝑇

𝐿 𝑛 · 𝑇

Page 86: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Bitwise adding of 2 n-bit numbers, starting from MSD● Seems impossible, but can be derived from parallel (4:2)-adders in CS

(SD as well):

● ai, bi, ai-1, bi-1, ai-2, bi-2 must be known to compute si

● Thus, this “Digit Online Addition” has an online-delay of 𝛿 2

2.5.2 MSD-first serial adder (digit online arithmetic)

6/3/2019 42Selected Topics of VLSI Design

Page 87: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Comparison to LSB-first adder:o Needs conversion to 2’s complemento More wiring (2 wires/digit)o Slower: 𝛿 2

● Why digit online technique?o Add, Sub, Mult are “natural” LSB-first-In/Out operationso But: Division and more complex functions are MSD-first-In/Out

all n input digits have to be known LSB-first-In wait for n cycles MSB-first-Out

o Digit online better suited for mixed and concatenated operations of Add, Div, Log, Sub… all operations can be performed MSD-first, but not using LSB-first MSD-first-In wait ∑ 𝛿 cycles MSD-first-Out Lower overall latency (even than for parallel operations!) is possible

due to overlapping input and output digits But: throughput typically lower than with parallel operations

2.5.2 MSD-first serial adder (digit online arithmetic)

6/3/2019 43Selected Topics of VLSI Design

Page 88: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Online delays 𝛿 of basic operations

2.5.2 MSD-first serial adder (digit online arithmetic)

6/3/2019 44Selected Topics of VLSI Design

Page 89: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● adds m n-bit operands in parallel ● a)

● b)

b) much faster when using a pipelined CPA

2.5.3 Accumulator

6/3/2019 45Selected Topics of VLSI Design

𝐴 𝐴 𝐴

𝑇 𝑇 𝑇

𝐿 𝑚 · 𝑇

𝐴 𝐴 𝐴 𝐴

𝑇 𝑇 𝑇

𝐿 𝑚 · 𝑇

Page 90: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Increment / Decrement● Counter (feedback increment)● Comparators ( , , , ⋯)

o TC, SD, CS: 𝑇 𝑙𝑜𝑔 𝑛● Detect leading zeroes

o TC, SD, CS: 𝑇 𝑙𝑜𝑔 𝑛● Determine flag bits in processors

2.6 Adder-based Operations

6/3/2019 46Selected Topics of VLSI Design

Page 91: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

Institute ofApplied Microelectronics & Computer Engineering

Selected Topics of VLSI Design

Part 3: Multiplication

Prof. Dr.-Ing. Dirk [email protected]

Page 92: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● 3.1 Fundamentalso Unsigned Multiplication, 2’s Complement Multiplication

● 3.2 Unsigned Braun-Array Multiplier

● 3.3 Signed Pezaris-Array Multiplier

● 3.4 Booth Multiplier

● 3.5 Booth-Wallace Multiplier

● 3.6 Evaluation

Outline

5/6/2019 Selected Topics of VLSI Design 2

Page 93: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Like paper-and-pencil multiplication● Multiplication of 2 n-bit operands A and B yields 2𝑛-bit product

3.1 Fundamentals

5/6/2019 3Selected Topics of VLSI Design

3.1.1 Unsigned Multiplication

● Multiply algorithm:1) Generate n partial products 𝑃2) Sum up all partial products 𝑃

Shift-and-Add

𝑃 𝐴 · 𝐵 𝑎 2 · 𝑏 2 𝑎 𝑏 · 2

𝑃 𝑎 · 𝐵 , 𝑃 𝑃 2

(see 1.6.2.2 Recursive, associative function)

Note:𝑃 Product𝑃 Partial product𝑝 Bit 𝑖 of product

Page 94: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

a) Recursive (shift-and-add) using one accumulator

b) Serial (shift-and-add) using linear array of CSAso All pi are generated in parallel

5/6/2019 4Selected Topics of VLSI Design

3.1.1 Unsigned Multiplication (cont’d)

Reg

ai

B

P

CPACLK

i = 0, ..., n ‐1 Shift left by i bits

1n

2n

*Metric:𝐴 𝑂 𝑛 ⋅ log 𝑛𝑇 𝑂 log 𝑛𝐿 𝑛

CSA

CSA

CSA

CSA

CPA

*

*

*

*

a0

a1

a2

a3

A

B

4n inputof CPA

Carry and sum

2n

Metric:𝐴 𝑂 𝑛𝑇 𝑂 𝑛 log 𝑛

CPA

Page 95: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

c) Parallel using multi-operand adder (tree-structure)

3.1.1 Unsigned Multiplication (cont’d)

5/6/2019 5Selected Topics of VLSI Design

CPA

*

**A

B

*

CSA ‐ Tree

2n

2n

2n

a0a1a2a3

Metric:𝐴 𝑂 𝑛𝑇 𝑂 log 𝑛

Page 96: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Option 1 o Complement operands before and result after multiplication Unsigned multiplication algorithm applicable

● Option 2o Use dedicated two’s complement multipliers e.g., Braun, Pezaris, Baugh-Wooley

3.1.2 Two’s Complement Multiplication

5/6/2019 6Selected Topics of VLSI Design

Page 97: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● E.g., for 4-bit operands

3.2 Unsigned Braun-Array Multiplier

5/6/2019 7Selected Topics of VLSI Design

a0b3 a0b2 a0b1 a0b0

a1b3 a1b2 a1b1 a1b0

a2b3 a2b2 a2b1 a2b0

a3b3 a3b2 a3b1 a3b0

p7 p6 p5 p4 p3 p2 p1 p0

Metric:𝐴 8𝑛 11𝑛 𝑂 𝑛𝑇 6𝑛 9 𝑂 𝑛

ai

bi

pi

Braun ai

bi

pi (MSBs)pi (LSBs)

Page 98: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● 4-bit Braun-Array multiplier

3.2 Unsigned Braun-Array Multiplier (cont’d)

5/6/2019 8Selected Topics of VLSI Design

b0

FA FA FA

FAFAFA

FA FA FA

HA HA HA

b3 b2 b1

CPA

p0

p1

p2

p3

p4p5p6p7

a0

a1

a2

a3

CSA

2

1

3

Page 99: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Modified Braun-Array multiplier, here shown for 4-bit operands● MSB = sign bit value = -1

3.3 Signed Pezaris-Array Multiplier

5/6/2019 9Selected Topics of VLSI Design

-a0b3 a0b2 a0b1 a0b0

-a1b3 a1b2 a1b1 a1b0

-a2b3 a2b2 a2b1 a2b0

a3b3 -a3b2 -a3b1 -a3b0

p7 p6 p5 p4 p3 p2 p1 p0

Page 100: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Four cases for partial product Pi

a) 3 pos. operands regular FA

b) 2 pos., 1 neg. operandsoo Weight of sum-bit: -1o Weight of cout: +2

c) 1 pos., 2 neg. operandsoo Weight of sum-bit: +1o Weight of cout: -2

d) 3 neg. operands logically identical to a) identical implementation: regular FA

3.3 Signed Pezaris-Array Multiplier (cont’d)

5/6/2019 10Selected Topics of VLSI Design

𝑎 𝑏 𝑐

𝑎 𝑏 𝑐 2𝑐 𝑠

1 𝑠𝑢𝑚 2

2 𝑠𝑢𝑚 1

𝑎 𝑏 𝑐 2𝑐 𝑠

Page 101: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● b) and c) have same implementation

● Approach: replace FA in regions , , and with modified FA (input a = •)

● Same structure like Braun multiplier (except modified FA)

3.3 Signed Pezaris-Array Multiplier (cont’d)

5/6/2019 11Selected Topics of VLSI Design

𝑠 𝑎 ⊗ 𝑏 ⊗ 𝑐𝑐 𝑎 ∧ 𝑏 ∨ 𝑎 ∧ 𝑐 ∨ 𝑏 ∧ 𝑐

(regular FA)(modified FA)

b0

FA FA FA

FAFAFA

FA FA FA

HA HA HA

b3 b2 b1

CPA

p0

p1

p2

p3

p4p5p6p7

a0

a1

a2

a3

CSA

2

1

3

Page 102: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Observation: multiplication delayo For every 0 in ai one row can be omitted in array!o Recoding of ai to maximize number of 0’s

(𝑎 ∈ 0,1 → 𝑎 ′ ∈ 1,0,1 )● Two possibilities:

a) ai always constant: CSD-Recoding (1/3 of area on average) b) ai variable: modified Booth-Encoding (1/2 of area)

Booth Multiplier

● Note: “horizontal” data compression can be achieved with Dadda-multiplier (Booth = “vertical” compression)

3.4 Booth Multiplier

5/6/2019 12Selected Topics of VLSI Design

𝑓 ⋕ partial products 𝑃 𝑓 𝑛

*

CSA - array

CPA

Mod

.Boo

th-

Rec

odin

g

Parallel calculation

ai

bin

n/2 partial products Pi*

**

ai‘ Metric:𝐴 𝑂 𝑛𝑇 𝑂 𝑛 log 𝑛

Page 103: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● take Booth multiplier and replace CSA-array with Wallace-tree (see 2.4.3)

3.5 Booth-Wallace Multiplier

5/6/2019 13Selected Topics of VLSI Design

Metric:𝐴 5 … 6𝑛𝑇 𝑂 𝑙𝑜𝑔 𝑛 ; → 𝑇 2 · 𝑙𝑜𝑔 𝑛

CSA tree CPA

Page 104: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

3.6 Evaluation of multiplier architectures

06.05.2019 Selected Topics of VLSI Design 14

Trough-put Latency Area Regularity Pipelining

Recursive - - o ++ - (control needed) - -Braun + o o ++ ++Booth + + o + +Booth-Wallace + ++ - - - +

Page 105: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

Institute ofApplied Microelectronics & Computer Engineering

Selected Topics of VLSI Design

Part 4: Division

Prof. Dr.-Ing. Dirk [email protected]

Page 106: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● 4.1 Definitions

● 4.2 Fundamentals

● 4.3 Restoring Division

● 4.4 Non-Restoring Division

● 4.5 SRT Division

● 4.6 Multiplicative Division

● 4.7 Evaluation

Outline

5/28/2019 Selected Topics of VLSI Design 2

Page 107: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

(avoid overflow: pre-normalize B and A)

4.1 Definitions

5/28/2019 3Selected Topics of VLSI Design

𝑅 𝐵 ; 𝑅 𝐴 𝑚𝑜𝑑 𝐵𝐴𝐵 Q

𝑅𝐵 → A Q · 𝐵 𝑅

𝐴 ∈ 0, 2 1

𝐵, 𝑄, 𝑅 ∈ 0, 2 1 , B 0

Q 2 → 𝐵 ∈ 2 , 2 1

→ 𝐴 2 · 𝐵

Page 108: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 4Selected Topics of VLSI Design

4.2 Fundamentals (cont’d)

● Like paper-and-pencil division, dividend : divisor = quotient● Steps:

a) Compare left shifted divisor with dividendb) Subtract conditionally to get partial remainderc) Go to a) with partial remainder as dividend

Subtract-and-shift algorithm

● Decimal example:

Sequential, not associative,no parallelism

0,75: 0,875 750: 875 0,857 7500 70000 50000 437500 625000 6125000 1250

A B qi

Ri

Page 109: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 5Selected Topics of VLSI Design

4.2 Fundamentals (cont’d)

● Basic algorithm for all subtract-and-shift division algorithms

● Division methods differ in selecting qi and if redundant adders are used !

𝑞 𝑅 2 𝐵 𝑅 𝑅 𝑞 2 𝐵

𝑖 𝑛 1, … , 0

InitializationRemainder after iteration

a) b)

c)

𝑅 𝐴𝑅 𝑅

!

Page 110: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● e.g.:● index i:

● index i-1:

If remainder is too small for divisor, the current iteration result (𝑅 𝐵2 ) is discarded. Instead, next ‘0’ is appended (identical to shifting divisor one position lesser to MSB)

4.3 Restoring-Division

5/28/2019 6Selected Topics of VLSI Design

𝑞 1 iff 𝑅 𝐵2 00 iff 𝑅 𝐵2 0

𝑞 ∈ 0,1

𝑅 𝐵2 0 → 𝑞 0 ; 𝑅 𝑅

𝑅 𝐵2 0 → 𝑞 1 ; 𝑅 𝑅 𝐵2

!

! (cf. 4.4)

Page 111: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Two implementation options in case 𝑞 0 :

1.

2.

Option 2 preferable: subtract in each case and restore from register ifnecessary

4.3 Restoring-Division (cont’d)

5/28/2019 7Selected Topics of VLSI Design

𝑅 𝑅 𝐵 · 2

𝑅 ′ 𝑅 𝐵 · 2

𝑅 𝑅 𝐵 · 2

𝑅 ′ 𝑅

save 𝑅 before subtraction

“restoring” with additional mux and register

requires addition to restore 𝑅

Page 112: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● index i:

● index i-1:

● Note:

Evaluate sign, subtract or add, correct by addition in next steps until partial remainder is positive again, identical red and green terms and algebraically equivalent q in 4.3 and 4.4 show identity of both methods

4.4 Non-Restoring Division

5/28/2019 8Selected Topics of VLSI Design

𝑞 ′1 iff 𝑅 01 iff 𝑅 0 𝑞 ′ ∈ 1, 1

𝑅 0 → 𝑞 1 ; 𝑅 𝑅 𝐵 · 2

𝑅 𝐵2 0 → 𝑞 ′ 1 ; 𝑅 𝑅 𝐵2 𝐵2 𝑅 𝐵2

𝑞 𝑞 01 𝑞 𝑞 11

!

!(cf. 4.3)

Page 113: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Conversion of Q’ = (qn-1’, … q0’) to Two’s complement representation:

Q = (𝑞n 1,qn 2, … , q0, 1) Q’ is not redundant, no CPA req’d

4.4 Non-Restoring Division (cont’d)

5/28/2019 9Selected Topics of VLSI Design

𝑞 ∈ 1,1 → 𝑞 ∈ 0,1 → 𝑞 0 if 𝑞 1

1 if 𝑞 1

≥0≥0

≥0≥0

CPACPA

CPACPA

Q‘

Ri

A B

Correction of R

● Implementation:o For sign detection non-redundant adder is mandatoryo Last remainder needs to be corrected

Metric:𝐴 𝑛 1 · 𝐴 O(n2)...O(n2log2(n))𝑇 𝑛 1 · 𝑇 O(n2)…O(n log2(n))

CPA = RCA CLA

Page 114: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Extension to signed 2’s complement division:

● Example: 2’s complement array divider (B>0, no correction of R)o XOR gates for sign evaluation

o Partial remainder Ri would tend to 0. o Note: Ri is kept in about the same range during iteration. Thus,

rounding errors are reduced.

4.4 Non-Restoring Division (cont’d)

5/28/2019 10Selected Topics of VLSI Design

𝑞 ′1 iff 𝑅 , 𝐵 have same sign

1 iff 𝑅 , 𝐵 have different sign 𝑞 ∈ 1, 1

𝑋𝑂𝑅 0 → different signs → 𝑅 𝑅 ⋯ 1 → identical signs → 𝑅 𝑅 ⋯

𝑅 𝑅 · 2 𝑞 · 2 · 𝐵

bi 𝑎6⊕b3

o a2, a1, a0 are fetched consecutively

Page 115: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Shifted array of CAS cells (Controlled Adder/Subtractor)o XOR gates included in CAS cells

4.4 Non-Restoring Division (cont’d)

5/28/2019 11Selected Topics of VLSI Design

Page 116: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Sweeney, Robertson, Tocher (~1958)● Use redundant adders● Problem: Fast detection of sign in redundant number, without:

o Evaluation of all digits oro Conversion to 2’s complement

● Example:o 00011𝑋𝑋 no sign detection from MSD, same problem for CS-

numbers

● Solution: Evaluate a few leading digits of partial remaindero If 0: number is small enough to assume 𝑞 0 (without

diverging iteration)o Else: similar to non-restoring division

4.5 SRT-Division

5/28/2019 12Selected Topics of VLSI Design

Page 117: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● appropriate scaling of B yields

● 3 MSDs of Ri are sufficient for determination of qi’● Nevertheless, convergence is assured ( use redundant adder instead

of CPA)● Qi’ needs conversion: SD to 2’s complement by using CPA for qi’

4.5 SRT-Division (cont’d)

5/28/2019 13Selected Topics of VLSI Design

2 𝐵 2𝐵 · 2 2 𝑅 2 𝐵 · 2

𝑞101

iff 𝐵 · 2 𝑅

𝐵 · 2 𝑅 𝐵 · 2 𝑅 𝐵 · 2

𝑞101

if 2 𝑅

2 𝑅 2 𝑅 2

Page 118: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Implementation:

Just a little slower than array multiplication State-of-the-art division method

4.5 SRT-Division (cont’d)

5/28/2019 14Selected Topics of VLSI Design

Conversion to 2's comp.

CSACSA

CSA

CPA

CPA

Q

R

A B

+‐

+‐+‐

+‐

Conversion to 2's comp.

≥0

CSA+‐

≥0≥0

≥0q i

' redundant

Metric:𝐴 𝑛 · 𝐴 2𝐴 𝑂 𝑛𝑇 𝑛 · 𝑇 𝑇 𝑂 𝑛 log 𝑛

Page 119: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● So far o Add/sub as basic functionso Execute n times 𝑇 𝑂 𝑛o Linear convergence +1 valid bit per iteration

● Nowo Mult as basic function (Goldschmidt 1964, used in IBM 360)o Execute log 𝑛 -times 𝑇 𝑂 log 𝑛o Quadratic convergence doubles valid bits per iteration

● Algorithm

4.6 Multiplicative Division

5/28/2019 15Selected Topics of VLSI Design

Metric:𝐴 𝑂 𝑛 1 Mult only 𝑇 𝑂 log 𝑛

𝑄𝐴𝐵

𝐴 · 𝑅 · 𝑅 … 𝑅𝐵 · 𝑅 · 𝑅 … 𝑅

𝑄 𝐴 · 𝑅

Choose 𝑅 so thatconverges to 1

𝐵 · 𝑅 … 𝑅

Page 120: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● Sequential dividerso One add/sub-unit as hardwareo Low areao Low throughput

● Array dividerso n add/sub-unit as hardwareo High area, but regular designo High throughput

● Multiplicative dividerso Reuse of available multipliero Very fast for large n

4.7 Evaluation

5/28/2019 16Selected Topics of VLSI Design

Page 121: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

Institute ofApplied Microelectronics & Computer Engineering

Selected Topics of VLSI Design

Part 5: Elementary Functions

Prof. Dr.-Ing. Dirk [email protected]

Page 122: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

● 5.1 Examples and Classification of Algorithms

● 5.2 CORDICo vector rotation, generalization, architectures, redundant numbers

Outline

5/28/2019 Selected Topics of VLSI Design 2

Page 123: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5.1 Examples and Classification of Algorithms

5/28/2019 3Selected Topics of VLSI Design

●● Some elementary functionso Logo o ex

o xy

o Sino Coso Atano Cosho ….

Page 124: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 4Selected Topics of VLSI Design

5.1 Examples and Classification of Algorithms

● ROMo 𝐴 𝑂 𝑛 · 2 o 𝑇 ? → slowo Restricted to functions with one operand and n ≤ 20..24 bito Hard to pipeline

● Polynomialo Taylor Serieso Chebyshev Series

𝐴 𝑂 𝑛 𝑇 ? better convergence less terms Hard to pipeline Excellent for software and big n

Page 125: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 5Selected Topics of VLSI Design

5.1 Examples and Classification of Algorithms

● Alternative number systemso Logarithmic systemso Residual systemso Problem: conversion between systems

● Iterationo Newton-Raphson

cf. multiplicative division)o Digit-by-Digit method

cf. Paper-and-pencil division, SRT division CORDIC

Conversion

Conversion

2's complement

Calculation

2's complement

Alternative number system

Page 126: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 6Selected Topics of VLSI Design

5.2 CORDIC

● COordinate Rotation DIgital Computero [Volder 1959, Walther 1971]

● Given: 𝑥 , 𝑦 , 𝜃 ● Wanted: 𝑥, 𝑦

● Use Matrix for rotation in Euclidean Space:

5.2.1 Vector Rotation

𝑥 𝑥 · cos 𝜃 𝑦 · sin 𝜃𝑦 𝑥 · sin 𝜃 𝑦 · cos 𝜃

𝜃𝜃 𝑥, 𝑦

𝑥, 𝑦 𝑥 , 𝑦

Page 127: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 7Selected Topics of VLSI Design

5.2.1 Vector Rotation (cont’d)

● Transformation

● Elementary rotation angle

● Iteration

𝑥 cos 𝜃 · 𝑥 𝑦 · tan 𝜃 𝐾 · 𝑥 𝑦 · tan 𝜃𝑦 cos 𝜃 · 𝑥 · tan 𝜃 𝑦 𝐾 · 𝑥 · tan 𝜃 𝑦

𝜃 arctan 2 → tan 𝜃 2

𝑥 𝐾 · 𝑥 2 · 𝑦𝑦 𝐾 · 𝑦 2 · 𝑥

𝐾1

1 tan 𝜃

Page 128: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 8Selected Topics of VLSI Design

5.2.2 Decomposition of Rotation in Elementary Rotation Angles

● 𝜃 ∑ 𝜎 · 𝜃 depending on direction of rotation 𝜎 ∈ 1, 1● Compute 𝜎 using successive Sub/Add (pseudo division)● Example:

i 𝜃𝒊 arctan 𝟐 𝒊

0 45.0°1 26.5°2 14.03°3 7.1°4 3.5°

Iteration i Angle Sign 𝜎𝒊

0 𝜃 𝜃 77° Positive σ 11 𝜃 77° 45° 32° Positive σ 12 𝜃 32° 26,5° 5,5° Positive σ 13 𝜃 5,5° 14,03° 8,53° Negative σ 1

𝜃 77° 45° 26,5° 14,03° 7,1° …

Page 129: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 9Selected Topics of VLSI Design

5.2.2 Decomposition of Rotation in Elementary Rotation Angles

● Iteration

● What about 𝐾 ?

● After n iterations (without considering 𝐾 ) vector magnitude is “stretched” by 𝐾

multiply 𝑥 and 𝑦 by known scaling factor to correct magnitude after final iteration

CSD encoding of possible

𝑧 𝜃, 𝑧 𝑧 𝜎 · arctan 2

𝜎 11 for

𝑧 0𝑧 0

Goal of iteration:

𝐾1

1 tan 𝜃1

1 2

𝑦1𝐾 · 𝑦

𝑥1𝐾 · 𝑥

𝐾 is a constant !

Page 130: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 10Selected Topics of VLSI Design

5.2.3 Modes of Operation: Rotation and Vectoring

● “Rotation” modeo 𝑧 𝜃, iteration goal 𝑧 → 0

● “Vectoring” modeo 𝑧 0, iteration goal 𝑦 → 0

𝑥 𝑥 𝜎 · 2 · 𝑦𝑦 𝑦 𝜎 · 2 · 𝑥𝑧 𝑧 𝜎 · arctan 2

𝜎 11 for

𝑦 0 𝑦 0

𝑥 𝑥 · cos 𝜃 𝑦 · sin 𝜃𝑦 𝑥 · sin 𝜃 𝑦 · cos 𝜃

𝜎 11 for

𝑧 0𝑧 0

Page 131: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 11Selected Topics of VLSI Design

5.2.4 Generalization for other Coordinate Systems [Walther]

● Vector magnitude 𝑅 𝑅 · 1 𝑚 · 𝜎 · 2

● 𝛼 , arctan 𝑚 · 2

o 𝛼 , arctan 2o 𝛼 , 2o 𝛼 , artanh 2

𝑥 𝑥 𝑚 · 𝜎 · 2 · 𝑦𝑦 𝑦 𝜎 · 2 · 𝑥𝑧 𝑧 𝜎 · 𝛼 ,

with 𝑚 1 → trigonometric circular 0 → linear

1 → hyperbolic

Page 132: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 12Selected Topics of VLSI Design

5.2.5 Overview of CORDIC FunctionsMode m Rotation (𝒛𝒊 → 𝟎) Vectoring (𝒚𝒊 → 𝟎)

circular𝑚 1

linear𝑚 0

hyperbolic𝑚 1

𝑥𝑦𝑧

𝑥 cos 𝑧 𝑦 sin 𝑧𝑦 cos 𝑧 𝑥 sin 𝑧0

𝑥𝑦𝑧

𝑥𝑦 𝑥 · 𝑧0

𝑥𝑦𝑧

𝑥 cosh 𝑧 𝑦 sinh 𝑧𝑦 cosh 𝑧 𝑥 sinh 𝑧0

𝑥𝑦𝑧

𝑥 𝑦0𝑧 arctan

𝑦𝑥

𝑥𝑦𝑧

𝑥0𝑧

𝑦𝑥

𝑥𝑦𝑧

𝑥 𝑦0

𝑧 artanh 𝑦𝑥

● 𝑒 , 𝑙𝑜𝑔 𝑥 , ln 𝑥 computable using angle sum identities● CORDIC provides a nearly universal method for evaluation of

elementary functions, yielding one bit accuracy per iteration

Page 133: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 13Selected Topics of VLSI Design

5.2.6 Architectures

● Small area, needs control logic● low throughput, no pipelining● Variable shift needed large barrel shifter

5.2.6.1 Recursive

Metric:𝐴 ≅ 3𝑛𝑇 𝑛 · log 𝑛

Page 134: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 14Selected Topics of VLSI Design

5.2.6.2 Pipeline

● Barrel shifter replaced by hard wiring● ROM hard wiring

o (3:1)-MUX for 3 cases of m● High throughput (times n)● High area (times n)

Metric:𝐴 ≅ 3𝑛𝑇 𝑛 · log 𝑛

Page 135: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 15Selected Topics of VLSI Design

5.2.6.3 Array

● Low latency● Low throughput

Page 136: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 16Selected Topics of VLSI Design

5.2.7 CORDIC and Redundant Number Systems

● Motivation: Avoid carry propagation during addition

● Issue 1: sign detection of redundant numberso approx. of 𝜎 by looking at 𝑝 first significant digits 𝑧 or 𝑦 (p ≪ 𝑛)o similar to SRT-Division

𝜎 ∈ 1,0, 1o But note

● Issue 2: variable scaling for 𝜎 0

𝐾1

1 𝑚 · 𝜎 · 2Until now: 𝜎 ∈ 1, 1 , 𝑚 ∈ 1,0, 1

Page 137: Selected Topics of VLSI Design - uni-rostock.de€¦ · Textbooks Parhami, B.: Computer Arithmetic, Algorithmsand Hardware Designs, 2nd edition, Oxford University Press, New York,

5/28/2019 17Selected Topics of VLSI Design

5.2.7 CORDIC and Redundant Number Systems (cont’d)

● Solutions to avoid variable scaling when 𝜎 0o Constant scaling

Defined direction of rotation when 𝜎 0; e.g., 𝜎 1 Small error

After defined number of iterations repeat iteration E.g. 𝑝 3 → repeat each 5th iteration

Convergence guaranteed Standard method

o Double rotation Instead of rotation by 𝛼 do two rotations by ~ (→arctan(2 ))

Scaling factor is constant, double rotations twice area/timeo True variable scaling factor multiplier needed

𝜎 1 →0 →

2 · arctan 2 arctan 2 arctan 2