View
6
Download
0
Category
Preview:
Citation preview
Institute ofApplied Microelectronics & Computer Engineering
Selected Topics of VLSI Design
Prof. Dr.-Ing. Dirk Timmermanndirk.timmermann@uni-rostock.de
Please note this name change
3/31/2019 Selected Topics of VLSI Design 2
Module AdvancedVLSI Design
Selected Topics of
VLSI Design
Until2016
Short name „Chip project" "HW-Alg."Semester summer winter
SWS 1 1/1/1
Contenthardwarealgorithms VLSI chip project
Starting2017
Short name "Chip project" "HW-Alg."Semester winter summer
SWS 1 1/1/1ETCS 6 6
ContentVLSI chip project hardware
algorithms
● Lecture: Hardware oriented arithmetic algorithms and cryptography
● Exercise: Algorithms, building blocks, VHDL coding● Lab: during project week● Schedule: lecture, Monday 15:xx – 16:yy
exercise, replaces lectures beginning with xx.y.mandatory lab with attendance list: 11.6.-12.6. 9:00
● Location: Warnemuende, building 1, R 1226
Organization
3/31/2019 3Selected Topics of VLSI Design
Textbooks● Parhami, B.: Computer Arithmetic, Algorithms and Hardware Designs,
2nd edition, Oxford University Press, New York, 2010. ● Koren, I.: Computer Arithmetic Algorithms, 2002● Muller, J.M.: Elementary Functions, Algorithms and Implementation,
2nd ed., 2006● Klar, H., Noll, T.: Integrierte Digitale Schaltungen, Springer 2015, free
access from URO network● Pirsch, P.: Architekturen der digitalen Signalverarbeitung B.G. Teubner,
Stuttgart, 1996
Courses and Websites● Koren, I.: Computer arithmetic- Simulator ● Ercegovac, M.: Course Digital Arithmetic● Guyot, A. : Educational Applets● Strey, A.: Course Computer-Arithmetik
Literature
31.03.2019 Selected Topics of VLSI Design 4
Institute ofApplied Microelectronics & Computer Engineering
Selected Topics of VLSI Design
Part 1: Number Systems
Prof. Dr.-Ing. Dirk Timmermanndirk.timmermann@uni-rostock.de
● 1.1 Positional / Place-Value Notation of Numberso Representation of Integer Numbers, Real Numbers and Radix Selection
● 1.2 Signed Number Representationso Sign Magnitude, (r-1)-Complement, r-Complement and Redundant Binary
● 1.3 Roundingo via Truncation, Round-to-Nearest and Round-to-Nearest-Even
● 1.4 Overflowso in (r-1)-Complement, Carry-Save and Signed Redundant Binary Numberso Overflow Detection and Handling
● 1.5 Basic Operations● 1.6 Cost/Performance Estimation Basics
Outline
3/31/2019 Selected Topics of VLSI Design 6
● The number A is represented by n digits ai and a defined base/radix r
1.1 Positional / Place-Value Notation of Numbers
3/31/2019 Selected Topics of VLSI Design 7
o Binary r = 2o Ternary r = 3o Octal r = 8o Decimal r = 10o Hexadecimal r = 16
𝒊
● The value V(A) of the number A is given by the sum of the n partial products pi for each of its positions
● The partial product pi = ai ∙ri results from the multiplication of the digit aiwith its weight ri, which is a power of the radix r and determined by the position index i
● Integer number A with n digits Value V(A)
1.1 Positional / Place-Value Notation of Numbers
3/31/2019 Selected Topics of VLSI Design 8
● A positive integer number A has a range of: 0 V 𝐴 𝑟
● Real numbers contain n digits for the integer part and m digits for the fractional part
● A positive real number A has a range of: 0 V 𝐴 𝑟
● Real number A with n+m digits Value V(A)
● In computation two formats for the approximation of real number exist
1.1 Positional / Place-Value Notation of Numbers
3/31/2019 Selected Topics of VLSI Design 9
● Fixed point numbers the number of significant digits before and after the decimal point is fixed (as seen above)o decimal point is fixed and never explicitly represented in hardwareo its position is defined during design and must be known to interpret
the number● Floating point numbers The number of significant digits before
and after the decimal point depends on exponent
𝑺 𝒆𝒙𝒑𝒐𝒏𝒆𝒏𝒕
● Floating point numbers in modern computer IEEE 754 standardo Binary half precision 16 bit data words(= 1 + 10 + 5 bit)o Binary single precision 32 bit data words(= 1 + 23 + 8 bit)o Binary double precision 64 bit data words(= 1 + 52 + 11 bit)o …
● Radix Selection (cont’d from fixed point numbers)o Computations are performed in circuitso Binary representation (r = 2) is best representation for physical
signal levels in most logic Voltage U: { 0 , 1 } { VSS , VDD } Current I : { 0 , 1 } { IMIN , IMAX }
● Efficiency: How many bits do we need in a bit-oriented (r = 2) memory to store a positive number V ?
1.1 Positional / Place-Value Notation of Numbers
31.03.2019 Selected Topics of VLSI Design 10
● Positional notation as discussed above only covers positive numbers● For negative number different signed number representations (SNRs)
options exist ● SNR #1: Sign Magnitude (SM)
o Insert sign bit at an-1 before magnitude of numbero Positive number an-1 = 0 and Negative number an-1 = 1
1.2 Signed Number Representations
3/31/2019 Selected Topics of VLSI Design 11
𝐴 0 𝑎 … 𝑎 𝑎
𝐴 𝑟 1 𝑎 … 𝑎 𝑎
● A signed integer number A has a range of:𝑟 V 𝐴 𝑟
● + Symmetrical range● - Double representation of zero, requires different treatment of positive
and negative numbers in arithmetic circuits
● SNR #2: (𝒓-1)-complement 1‘s complemento Negative number results from complementing each digit 𝑎
according to: 𝑎 𝑟 1 𝑎o In a binary representation (𝑟 2) this procedure equals a bitwise
inversion („bit flipping“): 01010101 10101010
1.2 Signed Number Representations
31.03.2019 Selected Topics of VLSI Design 12
𝐴 0 𝑎 … 𝑎 𝑎 𝐴 𝐴 𝑟 1 𝑎 … 𝑎 𝑎
● A signed integer number A has a range of: 𝑟 V 𝐴 𝑟
● Same pros and cons as SM
● SNR #3: 𝒓- complement 2‘s complemento Start with (𝑟 -1)-complement and add 1 to the Least Significant
Digit (LSD)o Binary format (r = 2) most commonly used in digital circuits
1.2 Signed Number Representations
31.03.2019 Selected Topics of VLSI Design 13
𝐴 0 𝑎 … 𝑎 𝑎 𝐴 𝐴 𝑟 1 𝑎 … 𝑎 𝑎 1
● A signed integer number A has a range of: 𝑟 V 𝐴 𝑟
● + Identical treatment of positive and negative numbers in arithmetic circuits, e.g., adders; unique representation of zero
● - Asymmetrical range
● SNR #4: Redundant Representations (RR)o Allow multiple (redundant) representations for the same number
values V(A)o Also true for SM and (𝑟 -1)-complement due to double zero
representation, but typically RR means the following:
● RR #1: Signed Digit Representation (SD)o In SD numbers each digit has its own sign one extra bit per digit
1.2 Signed Number Representations
31.03.2019 Selected Topics of VLSI Design 14
o α and β must cover at least half of the interval defined by the radix
● SD number system is symmetrical for 𝛼 = 𝛽, else asymmetrical ● Maximum or minimum redundancy for symmetrical SD number system:
o Maximum redundancy 𝛼 𝑟 1
o Minimum redundancy 𝛼
● Examples:
1.2 Signed Number Representations
3/31/2019 Selected Topics of VLSI Design 15
Radix r Digit values ai
2 {-1, 0, 1}
3{-2, -1, 0, 1}{-1, 0, 1, 2}{-2, -1, 0, 1, 2}
4{-2, -1, 0, 1, 2} minimum redundancy{-3, -2, -1, 0, 1} not allowed! α, β bounds violated{-3, -2, -1, 0, 1, 2, 3} maximum redundancy
● Only SD numbers with 𝑟 = 2 (redundant binary (RB) numbers) are considered in the following sections 𝑎 ∈ 1, 0, 1
● A SD number with 𝑛 digits of 𝑎 ∈ 1, 0, 1 has 3 different representations, but only 2 1 different values can be representedo Example: 3 011 101 111
● Question: Which RB representation contains the smallest amount of non-zeros (‘1‘ or ‘-1’)?o Answer: use arithmetic conversions of non-zero bit-strings
Example: … 001111 … 111000 … … 010000 … 001000 …
1.2 Signed Number Representations
3/31/2019 Selected Topics of VLSI Design 16
𝑉 𝐴 2 2 ⋯ 2 2 2 2
𝑉 𝐴 2 2 ⋯ 2 2 2
● Such RBRs are called Canonical Signed Digits (CSD) and the conversion strategy is CSD-Recoding
● Definition: A CSD recoded number is an 𝑛 digit SD number that has a minimum amount of non-zeros (‘1’ and ‘-1’) and no adjacent non-zero digits
1.2 Signed Number Representations
3/31/2019 Selected Topics of VLSI Design 17
𝒂𝒊
𝒏 𝟏
𝒊 𝟎
≝ 𝒎𝒊𝒏 𝑤𝑖𝑡ℎ 𝒂𝒊 · 𝒂𝒊 𝟏 ≝ 𝟎 𝑓𝑜𝑟 1 𝑖 𝑛 1
● CSD-Recoding operates as iterative and sequential algorithm. Step by step the number is parsed from the least to the most significant digit/bit (“right to left”) to detect strings of adjacent non-zeros, which are converted immediately. The algorithm terminates if the formulated condition of 𝑎 · 𝑎 ≝ 0 is met!
36610 = 0001 0110 1110= 0001 0111 0010= 0001 1001 0010
CSD = 0010 1001 0010
-21310 = 1111 0010 1011= 1111 0010 1101= 1111 0011 0101= 1111 0101 0101
CSD = 0001 0101 0101
● Lookup-table for CSD-Recodingo possible 1 or 1 carries from lower positions must be considered
(𝒄𝒊 𝟏 𝒄𝒊 in next step)
1.2 Signed Number Representations
3/31/2019 Selected Topics of VLSI Design 18
Binary Number CSD recoded SD𝒂𝒊 𝟏 𝒂𝒊 𝒄𝒊 𝒂𝒊
∗ 𝒄𝒊 𝟏 Comment0 0 0 0 0 String of zeros0 1 0 1 0 Singular non-zero1 0 0 0 0 String of zeros1 1 0 1 1 Begin of non-zero string0 0 1 1 0 End of non-zero string0 1 1 0 1 String of non-zeros1 0 1 1 1 Singular zero1 1 1 0 1 String of non-zeros
o CSD recoding yields minimum | average | maximum minimum # of non-zeros: 0 𝑡𝑟𝑖𝑣𝑖𝑎𝑙! ∼
● Number dependent variable timing and sequential nature of CSD-recoding prohibit its efficient application at run-time. But it is excellent for the recoding of constant values or coefficients at design-time. o Each eliminated non-zero saves hardware and speeds up specific
arithmetic circuits (i.e. multipliers)
● Alternatively, parallel algorithms Booth and modified Booth will work faster for non-zero recoding at run-time.o However, Booth algorithm does not find the minimal form in each
case (see example)o Isolated non-zeros “010“ are not considered by this version
1.2 Signed Number Representations
3/31/2019 Selected Topics of VLSI Design 19
Binary Number Booth recoded SD (a-1 = 0)𝒂𝒊 𝒂𝒊 𝟏 𝒂𝒊
∗ Comment0 0 0 String of zeros1 1 0 String of non-zeros1 0 1 Begin of non-zero string0 1 1 End of non-zero string
● Modified Booth improves on standard Booth algorithm by overlapped bit scanning of 3 bit strings
o By considering isolated non-zeros “010“ the maximum amount of non-zeros after conversion is n/2 (for even numbers of n)
1.2 Signed Number Representations
3/31/2019 Selected Topics of VLSI Design 20
Binary NumberModified Booth recoded SD ( i = 1,3,5,… )
r = 2 r = 4Comment
𝒂𝒊 𝒂𝒊 𝟏 𝒂𝒊 𝟐 𝒂𝒊∗ 𝒂𝒊 𝟏
∗
0 0 0 0 0 0 String of zeros0 1 0 0 1 1 Single non-zero1 0 0 1 0 -2 Begin of non-zero string1 1 0 0 1 -1 Begin of non-zero string0 0 1 0 1 1 End of non-zero string0 1 1 1 0 2 End of non-zero string1 0 1 0 1 -1 Single zero1 1 1 0 0 0 String of non-zeros
● Example: Modified Booth Recoding for a 12 digit numbero Result is no CSD, but acceptable!
1.2 Signed Number Representations
3/31/2019 Selected Topics of VLSI Design 21
1110110100 0 00=33610
r = 2
r = 4
0 0 0 11 00 1 00 1 0
0 1 2 1 0 2
n = 12
● Comparison of recoding methods
Method Algorithm# of non-zeros
CommentMin Average Max
CSD Sequential 0 ~𝑛3
𝑛 12
Yields minimum # of non-zeros for constant values at design-time
Booth Parallel 0 ? ~ 𝑛 1’s Complement to Signed Digit
ModifiedBooth Parallel 0 ? ~ 𝑛 1
2for run-time recoding (multiplier) potential to save half of the chip area
● Conversion from 2‘s complement to SD numbers (Ar ASD)
𝐴𝑟 𝑎 𝑎 … 𝑎 𝑎 𝐴𝑆𝐷 𝑎 𝑎 … 𝑎 𝑎
o Fast: can be done in parallel within one gate delayo 510
01012 0101𝑆𝐷2
o 510 10112 1011𝑆𝐷2
1.2 Signed Number Representations
3/31/2019 Selected Topics of VLSI Design 22
● Conversion from SD numbers to 2‘s complement (ASD Ar)o Split SD number into positive and negative fraction ASD D+ and D-
o -1310 = 010111SD D+ = 000101 and D- = 010010o 2‘s complement number Ar = D+- D-
o Slow: requires one n-bit addition
● Conversion from 2‘s complement to SD numbers (Ar ASD)?o For positive numbers ASD = Ar
o For negative numbers Not this easy!o A general method Booth-Recoding! ASD = fBooth(Ar)
1.2 Signed Number Representations
3/31/2019 Selected Topics of VLSI Design 23
1
(0)1010
1 11
510 =2's Complement
Signed Digits
● Conversion from SD numbers to 2‘s Complement (ASD Ar)?o Split SD number into positive and negative fraction ASD D- & D+
o -1310 = 010111SD D- = 010010 & D+ = 000101o 2‘s Complement number Ar = D+- D-
o ASD Ar conversion requires run-time of an adder circuit! Slow!
● SD numbers in binary hardwareo Three possible values per digit 𝑎 ∈ 1, 0, 1 require two bit for
each digit Hardware costs (wires, registers, ALUs) doubleo Two bit per digit allow four different encodings, but two of them are
typically used Sign Value & Negative Positive
1.2 Signed Number Representations
3/31/2019 Selected Topics of VLSI Design 24
Sign Value (SV) Negative Positive (NP)ai S V N P-1 1 1 1 00 0 0 0 01 0 1 0 1
comment intuitive because of its Sign Magnitude representation
easier ASDAr conversion as D+ = Pn-1….P0 and D- = Nn-1…N0
● RR #2: Carry-Save Representation (CS)o Carry-Save numbers originate from hardware structures of full
adders (FAs) and half adders (HAs)
1.2 Signed Number Representations
3/31/2019 Selected Topics of VLSI Design 25
o Digit 𝑎 represents a tuple: 𝑎 𝑠 𝑐 2 · 𝑐 𝑠o CS numbers are stored as combination of a carry- and intermediate
sum-vector
● Additions with CS number only have a critical path of one half adder, but require 2 bit per digit storage and communication (wires)
1.2 Signed Number Representations
3/31/2019 Selected Topics of VLSI Design 26
● 𝑉 𝐴 𝑐 𝑐 𝑐 … . 𝑐 𝑐
𝑠 𝑠 … . 𝑠 𝑠
● In general, there is no difference between CS and SD numberso CS numbers result from the outputs of a half adder (HA)o SD numbers have their origin in theory of number representations
● Why should RRs be applied or when is it worth to use them?
1.2 Signed Number Representations
3/31/2019 Selected Topics of VLSI Design 27
Pros Cons- Carry-free and thus faster addition /
subtraction (see adder section)- Arithmetic algorithms based on adders
(nearly all) benefit from this
- More resources- Comparison operations (≥, ≤, <, =, >)
are slow due to ASDAr conversion- ASDAr conversion slow due to adder
Ar ASD
Operation 1…
Operation k
ASD Ar
T ~ O(1)
Top,i ≠ f( )Carry-free operations!
Tadd ~ O(log2( ))
● Rounding trims numbers into formats with fewer digits o Examples
Two n bit numbers are multiplied and the result will be a number with 2n bits, but hardware only captures m < 2n bits
Rounding after right shift by one digit of an integer
● Rounding methods can be classified as follows:o Accuracy of the final results (or information loss by rounding)o Numerical error characteristics of the rounding methodo Cost/effort/delay to perform the rounding
● Assumeo Given: 𝐴 𝑎 𝑎 … 𝑎 𝑎 . 𝑎 … 𝑎 Cut 𝑑 bits o Rounded: 𝐵 𝑏 𝑏 … 𝑏 𝑏 𝐴 𝜀 ⇒ 𝜀 𝐵 𝐴o Goal: Minimize rounding error 𝜀
1.3 Rounding
31.03.2019 Selected Topics of VLSI Design 28
● Rounding Method #1: Truncationo Step 1: 𝑑 least significant bits are cut off from 𝐴o Rounding result 𝐵 𝑎 𝑎 … 𝑎 𝑎o Minimum error 𝜀 0.000002
o Maximum error 𝜀 1 2 0.111112
o Average error 𝜀 0.1000012
o Asymmetrical bias
1.3 Rounding
3/31/2019 Selected Topics of VLSI Design 29
Position –(𝑑+1)
A
B
1 2 3 4 5
1
2
3
4
● Rounding Method #2: Round-to-Nearesto Step 1: Addition of 0.510 to 𝐴 ⇒ 𝐴 𝐴 0.5 𝐴 0.1o Step 2: 𝑑 least significant bits are cut off from 𝐴 to fit 𝐵o Resulting effect is an alternate rounding to higher & lower numberso Rounding result 𝐵 𝑎 𝑎 … 𝑎 𝑎o Minimum error 𝜀 0.00000 (for A=0.0 B=0.0)o Maximum error 𝜀 2 0.1 (for A=0.1 B=1.0)
o Average error 𝜀 2 0.01
o Smaller asymmetrical bias (due to always rounding up of A=0.1)
1.3 Rounding
31.03.2019 Selected Topics of VLSI Design 30
A
B
can be often incorporated effortlessly into previous operation
1 2 3 4 5
1
2
3
4
● Rounding Method #3: Round-to-Nearest-Eveno Step 1: Addition of 0.510 to 𝐴 ⇒ 𝐴 𝐴 0.5 𝐴 0.1o Step 2: 𝑑 least significant bits of 𝐴 are zero cut off from 𝐴 to fit 𝐵 and
set 𝑎 to zero, otherwise proceed with Round-to-Nearesto Yields average bias of zero!
o 𝐵 ,𝐵 𝑖𝑓 𝑎 … 𝑎 0.000 …
𝑎 𝑎 … 𝑎 0 𝑒𝑙𝑠𝑒
o 𝑏𝑖𝑎𝑠 0o Symmetrical error and bias-free, mandatory in IEEE Floating Point
1.3 Rounding
31.03.2019 Selected Topics of VLSI Design 31
Idea: alternate rounding up and down to nearest even number
1 2 3 4 5
1
2
3
4
● Overflow occurs if numbers exceed available word length in datapaths
1.4 Overflow
31.03.2019 Selected Topics of VLSI Design 32
000.
..0
111.
..1
011.
..110
0...0
-2 n-1 2 n-1 2 n0
unsigned
2´s complement
1´s complement
sign magnitude
● Overflow in 2‘s complement numberso range 2 V 𝐴 2o Overflow in addition of two numbers
Reason: Carry out from sign digit is discarded Case 1: Two positive summands A and B negative sum S
𝑎 ∧ 𝑏 ∧ 𝑠 ⇒ 𝑐 1, 𝑐 0
Case 2: Two negative summands A and B positive sum S 𝑎 ∧ 𝑏 ∧ 𝑠 ⇒ 𝑐 0, 𝑐 1
In general, overflow occurs for 𝑐 𝑐 at sign digit (for add & sub)
1.4 Overflow
31.03.2019 Selected Topics of VLSI Design 33
FA
an-1 bn-1
sn-1
cout cin
Saturation Logic
overflows*n-1
● In non-redundant number systems overflows are definitely detectable
● Possible actions after overflow detectiono Emergency stopo Error handlingo Saturation to maximum (01111) or minimum
(10000) number
● Overflow in Carry-Save representationso In redundant number systems two types of overflow exist
True and pseudo overflowo Example: 0.510 + (-0.510) + 0 = 0 !!!
1.4 Overflow
31.03.2019 Selected Topics of VLSI Design 34
-20 2-1 2-2
0 1 0 0.510
1 1 0 -0.510
0 0 0 0
0 1 0 carry vector = -110
1 0 0 sum vector = -110
o Wrong intermediate result -210 in CS representation would yield correct value 0 if converted to 2’s complement via vector merging addition (VMA) of carry and sum vector
o Test: 1.00 + 1.00 = 10.00 (dropped) Result = 0.00 = 010
o Wrong results possible if other operations are executed on intermediate carry and sum vector
o Example: (0.510 + (-0.510) + 0) ∙ 0.510 = 0 Multiplication with 0.5 equals right shift with sign extension of carry and
sum vector Carry: 1.00 : 2 1.10 = - 0,510 Sum: 1.00 : 2 1.10 = - 0,510
------ VMA: 11.00 = - 110 ≠ 010
o Error becomes obvious after conversion to non-redundant number. However, correct result 0 would fit into given word length
o Those pseudo overflows are detectable and correctable as follows
1.4 Overflow
31.03.2019 Selected Topics of VLSI Design 35
o Pseudo overflow correction for CS numbers: Given: 𝑐 𝑐 . 𝑐 𝑐 …
𝑠 . 𝑠 𝑠 … 𝑠 Modify to: 𝒄𝟎. 𝑐 𝑐 …
𝒔𝟎. 𝑠 𝑠 …
using 𝑐 𝑐 and 𝑠 𝑐 𝑖𝑓 𝑐 𝑐𝑠 𝑒𝑙𝑠𝑒 𝑠 𝑠 ⨂𝑐 ⨂𝑐
XOR gates can be easily integrated as part of the MSD/MSB adder circuit at low hardware overhead without speed penalty!
o Method works as long as the converted 2‘s complement result fits into the given word length
o Example with pseudo overflow correction:
1.4 Overflow
31.03.2019 Selected Topics of VLSI Design 36
● In general, a reduction of leading digits of CS numbers can be achievedo provided that CS number fits into corresponding 2’s complement
number according to condition -1 ≤ C + S ≤ (1-2-(n-1))● as follows:
Given: 𝑐 … . 𝑐1𝑐 . 𝑐 𝑐 … 𝑠𝑛 … 𝑠 𝑠 . 𝑠 𝑠 …
Modify to: 𝒄𝟎. 𝑐 𝑐 …𝒔𝟎. 𝑠 𝑠 …
using 𝑠 𝑠 𝑖𝑓 𝑠 𝑐𝑠 𝑒𝑙𝑠𝑒 𝑐 𝑐 𝑖𝑓 𝑠 𝑐
𝑐 𝑒𝑙𝑠𝑒
● Pseudo overflow correction needs less digits and chip area than uncorrected formats
1.4 Overflow
31.03.2019 Selected Topics of VLSI Design 37
● Overflow in SD numberso Similar to the CS case Pseudo and real overflowso Thereby, the overflow behavior depends on the MSD sum bit 𝑠
and the intermediate carry bit 𝑑o Analysis for possible correction of 𝑠 as follows:
1.4 Overflow
31.03.2019 Selected Topics of VLSI Design 38
𝑑 𝑠 Overflow Type 𝒔𝒏 𝟏
1 N pseudo 11 0 potential1 1 realN N realN 0 potentialN 1 pseudo N0 X none 𝑠
o Pseudo overflow correctable at MSD without performance impact
o Real overflow must be avoided through modification at the system or algorithm level
o Potential overflow would require an inspection of all lower digits Hardware costs increase
o Potential overflow avoidable via range limitation to 2
● General options/mechanisms for handling of real overflowo Analytical analysis to identify minimum/maximum intermediate and
final valueso Corner case simulation of the system to check for sufficient word
lengths for any occurring valueso Thus estimate lower bound on word lengtho For insufficient word lengths or if too expensive:
Reduce accuracy less bits after decimal point Test whether application allows saturation Detect real overflow and handle it
1.4 Overflow
31.03.2019 Selected Topics of VLSI Design 39
● Wrap up of some basic operations on data and numbers
1.5 Basic Operations
31.03.2019 Selected Topics of VLSI Design 40
Operation
Shiftunsigned
left 𝒂𝒏 𝟐 … 𝒂𝟏𝒂𝟎𝟎right 𝟎𝒂𝒏 𝟏𝒂𝒏 𝟐 … 𝒂𝟏
signed2‘s complement
left 𝒂𝒏 𝟏𝒂𝒏 𝟑 … 𝒂𝟎𝟎right 𝒂𝒏 𝟏𝒂𝒏 𝟏 … 𝒂𝟏
Rotateleft 𝒂𝒏 𝟐 … 𝒂𝟏𝒂𝟎𝒂𝒏 𝟏
right 𝒂𝟎𝒂𝒏 𝟏𝒂𝒏 𝟐 … 𝒂𝟏
Extendunsigned
left 𝟎𝒂𝒏 𝟏𝒂𝒏 𝟐 … 𝒂𝟏𝒂𝟎
right 𝒂𝒏 𝟏𝒂𝒏 𝟐 … 𝒂𝟏𝒂𝟎𝟎
signed2‘s complement
left 𝒂𝒏 𝟏𝒂𝒏 𝟏𝒂𝒏 𝟐 … 𝒂𝟏𝒂𝟎
right 𝒂𝒏 𝟏𝒂𝒏 𝟐 … 𝒂𝟏𝒂𝟎𝟎
Saturateunsigned 𝒂𝒏 𝟏 … 𝒂𝒏 𝟏𝒂𝒏 𝟏
signed 2‘s complement 𝒂𝒏 𝟏𝒂𝒏 𝟏 … 𝒂𝒏 𝟏
● Some „Rule of Thumb“ estimations for delay and area of typical functions and algorithm structures in arithmetic circuitso Naming conventions:
𝐴 Area 𝑇 Cycle time/delay 𝐿 Latency # Number of cycles
o Basic assumption for gates: Inverter, Buffer 𝐴 0 , 𝑇 0 (negligible) Simple 2-Input gate 𝐴 1 , 𝑇 1 (AND, NAND, OR, NOR) Special 2-Input gate 𝐴 2 , 𝑇 2 (XOR, XNOR) Complex m-Input gate 𝐴 𝑚 1 , 𝑇 𝑙𝑜𝑔 𝑚 (gate tree) Wiring costs as well as area not considered (high abstraction)
o Basic assumptions for circuit function: Up to 𝑛 inputs 𝑎 𝑎 , 𝑎 , … , 𝑎 , 𝑎 Up to 𝑛 outputs 𝑧 𝑧 , 𝑧 , … , 𝑧 , 𝑧 Blue dots represent functions that generate outputs
𝑧 𝑓 𝑎 , 𝑎 , … , 𝑎 , 𝑎
1.6 Cost/Performance Estimation Basics
31.03.2019 Selected Topics of VLSI Design 41
o Non-recursive functions 𝑧 𝑓 𝑎 , 𝑥 𝑤𝑖𝑡ℎ 𝑖 0, 1, … , 𝑛 1 𝑎𝑛𝑑 𝑥 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 output 𝑧 only depends on input 𝑎 can be implemented as fully parallel hardware structure 𝐴 𝑂 𝑛 and 𝑇 𝑂 1
1.6 Cost/Performance Estimation Basics
31.03.2019 Selected Topics of VLSI Design 42
o Recursive functions with single output Output depends on all inputs 𝑧 𝑓 𝑎 , 𝑎 , … , 𝑎 , 𝑎 Case 1: 𝑓 non-associative 𝐴 𝑂 𝑛 and 𝑇 𝑂 𝑛 (serial structure) Case 2: 𝑓 associative 𝐴 𝑂 𝑛 and 𝑇 𝑂 𝑙𝑜𝑔 𝑛 (tree structure)
1.6 Cost/Performance Estimation Basics
3/31/2019 Selected Topics of VLSI Design 43
an-1
zn-1
an-2 ... a1 a0
a3
z3
a2 a1 a0
Case 1: non-associative Case 2: associative
o Recursive functions with multiple outputs Prefix problem 𝑧 𝑓 𝑎 , 𝑧 Case 1: f non-associate 𝐴 𝑂 𝑛 and 𝑇 𝑂 𝑛 (serial) Case 2: f associative 𝐴 𝑂 𝑛 and 𝑇 𝑂 𝑙𝑜𝑔 𝑛 (multi tree / serial) Case 3: f associative 𝐴 𝑂 𝑛 ⋅ 𝑙𝑜𝑔 𝑛 and 𝑇 𝑂 𝑙𝑜𝑔 𝑛 (shared)
1.6 Cost/Performance Estimation Basics
3/31/2019 Selected Topics of VLSI Design 44
Case 1: non-associative Case 2: associativean-1
zn-1
an-2
zn-2
...
...
a1
z1
a0
z0
a3
z3
a2
z2
a1
z1
a0
z0
Case 3: associative
a3
z3
a2
z2
a1
z1
a0
z0
inparallel
Institute ofApplied Microelectronics & Computer Engineering
Selected Topics of VLSI Design
Part 2: Adders
Prof. Dr.-Ing. Dirk Timmermanndirk.timmermann@uni-rostock.de
● 2.1 Fundamentalso Half Adder, Full Adder, (m,k)-Counter
● 2.2 Carry Propagate Adderso Ripple Carry, Carry Skip, Carry Select, Conditional Sum, Carry Lookahead,
Asynchronous
● 2.3 Non-Carry Adderso Carry Save, Redundant Binary
● 2.4 Multi-Operand Adderso Matrix Adder, (m:2)-compressor, Adder Trees
● 2.5 Sequential Adderso LSB-first, MSB-first, Accumulator
● 2.6 Add-based Operations
Outline
6/3/2019 Selected Topics of VLSI Design 2
● 1 Bit Adder or (𝑚,𝑘)-countero Counting 𝑚 1-bit numbers of same magnitudeo Result: 𝑘 -bit sum,
● Half Adder or (2,2)-counter
2.1 Fundamentals of Adders
6/3/2019 3
a b
scouts
ab
cout
𝑎 𝑏 2 𝑐 𝑠
Sum: 𝑠 𝑎 ⊕ 𝑏
Carry: 𝑐 𝑎 ∧ 𝑏
Selected Topics of VLSI Design
Metric:𝐴 3𝑇 1𝑇 2
Example:1 1 1 111 1 1 1 100𝑘 𝑙𝑜𝑔 𝑚 1
● Full Adder or (3,2)-counter
Popular variables:
2.1 Fundamentals of Adders
6/3/2019 4
𝑔 𝑎 ∧ 𝑏 ; generate cout
𝑝 𝑎 ⊕ 𝑏 ; propagate cin
𝐶 𝑎 ∧ 𝑏
𝑎 𝑏 𝑐 2 𝑐 𝑠
Selected Topics of VLSI Design
Composed of Half Adders
𝐶 𝑎 ∨ 𝑏
𝑠 𝑝 ⊕ 𝑐𝑖𝑛
● Full Adder
2.1 Fundamentals of Adders
6/3/2019 5
s
a b
coutcin
Selected Topics of VLSI Design
𝑐 𝑎 ∧ 𝑏 ∨ 𝑎 ∧ 𝑐 ∨ 𝑏 ∧ 𝑐 𝑐 𝑔 ∨ 𝑝 ∧ 𝑐
𝑠 𝑝 ⊕ 𝑐
Different ways to calculate s and coutOptimal structure depends on technology
a
pg
cout cin
s
b
𝑠 𝑝 ⊕ 𝑐
● Full Adder
2.1 Fundamentals of Adders
6/3/2019 6
0
cout
1
s
cin
p
a bb
cin
s
a
0
1 c1
c0
cout
Selected Topics of VLSI Design
Metric:𝐴 7𝑇 2𝑇 4
𝑠 𝑝 ⊕ 𝑐 𝐶1 ∧ 𝐶0 ⊕ 𝑐o Mux: 2 Transmission Gateso Transmission Gate:
𝑐 𝑐 ∧ 𝐶 ∨ (𝑐 ∧ 𝐶 )
𝑐 𝑐 ∧ 𝑝 ∨ (𝑎 ∧ 𝑝
● (m,k)-countero Addition of m bits
o Composed of full adderso Addition is associative: linear structure tree structure o Reduced critical path
2.1 Fundamentals of Adders
6/3/2019 7
( m, k )
s0
a0 a1 am-1
sk-1
......
Selected Topics of VLSI Design
● Example: (7,3)-counter
2.1 Fundamentals of Adders
6/3/2019 8
Linear structure Tree-structure
Selected Topics of VLSI Design
Metric:𝐴 28𝑇 10
Metric:𝐴 28𝑇 14
● Addition of 2 n-bit operands A, B and optional cin using carry propagation
● Sum: non-redundant (n+1)-bit number
● Different methods of carry propagation:o Ripple Carry Adder (RCA)o Carry Skip Adder (CSkA)o Carry Select Adder (CSel)o Conditional Sum Adder (CSum)o Carry Lookahead Adder (CLA)
2.2 Carry Propagate Adders (CPA)
9
𝐴 𝐵 𝑐 𝑆 2 ⋅ 𝑐
𝑎 𝑏 𝑐 𝑠 2 ⋅ 𝑐 for i 0,1…,n-1
𝑐 𝑐 𝑐 𝑐
CPAcout
A B
S
cin
Selected Topics of VLSI Design6/3/2019
● Serial arrangement of full adders (FA)● Simplest, smallest and slowest CPA
● Carry speed-up strategy for CPAs:o Type A: Partitioning in groups of shorter
CPAs (with fast cin cout)o Type B: Parallelization using tree structure
2.2.1 Ripple Carry Adder (RCA)
6/3/2019 10Selected Topics of VLSI Design
Metric:𝐴 7𝑛𝑇 2𝑛
0 0
in
1 1
01
n‐1 n‐1
n‐1
out
● Type A● Idea: determine for each group in parallel whether
a) cin generates cout -or-b) cin can skip this group (“skip + propagate”)
2.2.2 Carry Skip Adder (CSkA)
6/3/2019 11
CPACPACPA
an-1: j
s k-1: 0
bn-1 : j bi-1: kai-1: k a k-1: 0 b k-1: 0
si-1: ksn-1: jPi-1 : k
cout cincj ci ck
c'i0
1
...
...
k‐bit group
𝑐 𝑃 : ∧ 𝑐 ∨ 𝑃 : ∧ 𝑐
Selected Topics of VLSI Design
2.2.2 Carry Skip Adder (CSkA)
6/3/2019 12Selected Topics of VLSI Design
𝑃 : 𝑝 ∧ 𝑝 ∧ ⋯ ∧ 𝑝
𝑝 𝑎 ⊕ 𝑏
Group propagate
Bit propagate
Requires k-input AND-gate for each group
● Critical path in a given group = k bits
Function of 𝑃 : :
𝑃 : 0 ⇒ 𝑐 doesn t affect 𝑐
𝑃 : 1 ⇒ 𝑐 determines 𝑐𝑖
propagate 𝑐𝑖′ to mux output
𝑐𝑘 skips group and is propagated to mux input
● Question: Which group size is optimal w.r.t. delay?● Assumptions:
o fixed k for all groups n/k groups of same size
2.2.2 Carry Skip Adder (CSkA)
6/3/2019 13Selected Topics of VLSI Design
𝑇 𝑘 ∗ 𝑇𝑛𝑘 2 ∗ 𝑇 𝑘 ∗ 𝑇
2 ∗ 𝑘 ∗ 𝑇 𝑛 ∗ 𝑘 2 ∗ 𝑇
4 ∗ 𝑘 𝑛 ∗ 𝑘 2
TMux = 1TCarry = 2
2.2.2 Carry Skip Adder (CSkA)
6/3/2019 14Selected Topics of VLSI Design
𝑇 , 2 𝑛 𝑛12 𝑛 4
2 𝑛 2 𝑛 4
𝑻𝑪𝑺𝒌𝑨,𝒐𝒑𝒕 4 𝑛 4 𝐎 𝒏
● Further improvementso Faster CPAs, e.g. multi-staged CSkAo Variable group size, overlap TCPA + Tmux with TCPA of next group
larger middle groups. Note that sum time in last group depends on its group size and can only start after all preceding operation have finished for n=32 bit choose 1,2,3,4,5,6,5,3,2,1
o Cost compared to RCA: 1 XOR/bit, (1 AND + 1 Mux)/group
𝑇 𝑘 4 𝑛 ∗ 𝑘 0
⇒ 𝑘12 𝑛
Metric:𝐴 8𝑛𝑇 4 𝑛
● Type A● Idea:
a) Compute cout and sout for both possible cin in each k-bit groupb) Actual cin selects corresponding output and propagates result
2.2.3 Carry Select Adder (CSel)
6/3/2019 15Selected Topics of VLSI Design
k/n‐bit adder
cin
Sk‐1:0ci
k/n‐bit adderk/n‐bit
adder
Si‐1:k
10
1 0ck
ak‐1:0 bk‐1:0
𝑠 : 𝑐 ∧ 𝑠 : ∨ 𝑐 ∧ 𝑠 :
𝑐 𝑐 ∧ 𝑐 ∨ 𝑐 ∧ 𝑐
2.2.3 Carry Select Adder (CSel)
6/3/2019 16Selected Topics of VLSI Design
● Optimal group size (like CSkA):
● Further improvements:o Faster CPA, e.g. multi-staged CSelo Variable group size k
Overlapping TCPA and TMux with TCPA of next group Increase group size by one bit / group from LSB to MSB
e.g., for 28 bit: 7,6,5,4,3,2,1
● Cost compared to RCAo 1 “sum-Mux”/bit + (CPA+”carry-Mux”)/groupo Note: no duplication of whole CPA
A ⊕ B can be reused -or- Use Binary-to-Excess-1 Code (BEC) Converter for
simpler block with cin =1
Metric:𝐴 14𝑛𝑇 3 𝑛
𝑘12 𝑛
𝑻𝑪𝑺𝒆𝒍 𝐎 𝒏
2.2.3 Carry Select Adder with BEC (extra stuff)
6/3/2019 17Selected Topics of VLSI Design
● Binary-to-Excess-1 Code (BEC)for 4 bit, e.g., realizes increment by one
● Simple Generation
● Replace block with cin =1, requiresless area than standard structure
Sum output of block with cin =0
● Hybrid Type A (similar to CSel but 1-bit groups) and Type B (tree)● Parallel propagation of 1-bit groups using tree-structure (instead of
sequential propagation of carries of k-bit groups)● Fastest and most costly CPA exploiting max parallelism
o n-summand bits are propagated by mux tree depth: 𝑙𝑜𝑔 𝑛
o Cost: 2 ∗ 𝑅𝐶𝐴 2 ∗ 𝑙𝑜𝑔 𝑛 Mux/bit
2.2.4 Conditional Sum Adder (CSum)
6/3/2019 18Selected Topics of VLSI Design
Metric:𝐴 3𝑛 · 𝑙𝑜𝑔 𝑛 O n · 𝑙𝑜𝑔 𝑛𝑇 2 · 𝑙𝑜𝑔 𝑛 O 𝑙𝑜𝑔 𝑛
● Type B:o parallel tree structureo all carries are pre-computedo if too expensive for large n partitioning into k-bit groupso hierarchical arrangement in ½ log 𝑛 stages
● Implementationso Kogge-Stone (1973): fast, long wires, irregular layouto Brent-Kung (1982): much more regular, bit slowero Han-Carlson (1987): compromise between KS and BKo Ling/Sklansky (1981): large fanout to compute higher bitso Ladner-Fischer (1980): compromise between Ling and BK
2.2.5 Carry Lookahead Adder (CLA)
6/3/2019 19Selected Topics of VLSI Design
Metric:𝐴 𝑂 𝑛 · 𝑙𝑜𝑔 𝑛 )𝑇 𝑂 𝑙𝑜𝑔 𝑛 )
● Classification
2.2.5 Carry Lookahead Adder (CLA)
6/3/2019 20Selected Topics of VLSI Design
2.2.5 Carry Lookahead Adder (CLA)
6/3/2019 21Selected Topics of VLSI Design
𝑐 𝑐′𝑐 𝑔 ∨ 𝑝 ∧ 𝑐′𝑐 𝑔 ∨ 𝑝 ∧ 𝑐 𝑔 ∨ 𝑝 ∧ 𝑔 ∨ 𝑝 ∧ 𝑝 ∧ 𝑐′𝑐 𝑔 ∨ 𝑝 ∧ 𝑔 ∨ 𝑝 ∧ 𝑝 ∧ 𝑔 ∨ 𝑝 ∧ 𝑝 ∧ 𝑝 ∧ 𝑐′𝑔′ 𝑔 ∨ 𝑝 ∧ 𝑔 ∨ 𝑝 ∧ 𝑝 ∧ 𝑔 ∨ 𝑝 ∧ 𝑝 ∧ 𝑝 ∧ 𝑔𝑝′ 𝑝 ∧ 𝑝 ∧ 𝑝 ∧ 𝑝…..
Carry Lookahead Block (CLB) c’0
(g0,p0)(gn‐1,pn‐1) ...
(g‘n‐1,p‘n‐1) block generate & propagate
c0cn‐1 ...
2.2.5 Carry Lookahead Adder (CLA)
03.06.2019 Selected Topics of VLSI Design 22
● Example for 16b additiono Kogge-Stone Han-Carlson
o Brent-Kung
2.2.5 Carry Lookahead Adder (CLA)
03.06.2019 Selected Topics of VLSI Design 23
● a.k.a Carry Completion Adders● detects end of carry propagation and generates carry-completion
signal (indicates validity of sum bits) ● Tcarry,mean = ~ 𝑙𝑜𝑔 𝑛 stages for 𝑛-bit adder
● Pros:o simple RCA with 𝑂 𝑙𝑜𝑔 𝑛o well suited for resource limited architectures/cascaded additions
e.g. crypto hardware on smartcards● Cons:
o only for asynchronous (self-timed) systemso extra hardware for carry completion logic
2.2.6 Asynchronous Adder
6/3/2019 24Selected Topics of VLSI Design
Metric:𝐴 8𝑛𝑇 2𝑙𝑜𝑔 𝑛𝑇 2𝑛
● delay is independent of width
● adds 3 n-bit numbers without carry propagation● carry is saved in Carry-Save representation
● Operands can be o Three 2’s complement (TC) numbers oro One 2’s complement number + one CS-number
2.3 Non-Carry-Propagate Adders
6/3/2019 25Selected Topics of VLSI Design
2.3.1 Carry Save Adder (CSA)
0 1 2n n n
n n
𝑎 , 𝑎 , 𝑎 , 2𝑐 𝑠 ; 𝑖 0 … 𝑛 1
𝐴 𝐴 𝐴 𝐶 𝑆 𝐶, 𝑆
● Built out of n full adders
● 3 input vectors are merged into 2 output vectors● also called: (3,2)-compressor
2.3.1 Carry Save Adder (CSA)
6/3/2019 26Selected Topics of VLSI Design
s0s1 c1c2
a0,n‐1 a1,n‐1
sn‐1cn
a2,n‐1 a0,1 a1,1 a2,1 a0,0 a1,0 a2,0
Metric:𝐴 7𝑛𝑇 4
constant!
● Summation of numbers in Signed Digit Representation (RBA: base r=2)
2.3.2 Redundant Binary Adder (RBA)
6/3/2019 27Selected Topics of VLSI Design
𝑎 , 𝑏 , 𝑠 , 𝑑 , 𝑧 ∈ 1,0,1
𝑎 𝑏 2𝑑 𝑧
𝑧 𝑑 𝑠
𝑆 𝐴 𝐵
𝑑 ...intermediate carry𝑧 …intermediate sum
2.3.2 Redundant Binary Adder (RBA)
6/3/2019 28Selected Topics of VLSI Design
𝑎 𝑏 2𝑑 𝑧
𝑧 𝑑 𝑠
𝑑 intermediate carry𝑧 intermediate sum
Similar to carry save
𝑠 𝑧 𝑑 is carry-free iff:ai bi ai-1 bi-1 di+1 zi
1 1 X X 1 01 0 both 0
else1 -1
0 1 0 1ai+bi=0 X X 0 0
0 -1 both 0else
0 -1-1 0 -1 1-1 -1 X X -1 0 [Takagi, 1987]
𝑋 don‘t care𝑎 0𝑏 0
Digit i is only affecting digits i+1 and i+2 no carry propagation
Metric:𝐴 7𝑛𝑇 ≅ 2𝐹𝐴
● Examples:
2.3.2 Redundant Binary Adder (RBA)
6/3/2019 29Selected Topics of VLSI Design
01111 𝑎00001 𝑏
011111 𝑑01110 𝑧10000 𝑠
01111 𝑎00111 𝑏
000011 𝑑01000 𝑧01010 𝑠
ai bi ai-1 bi-1 di+1 zi
1 1 X X 1 01 0 Both 0
Else1 -1
0 1 0 1ai+bi=0 X X 0 0
0 -1 Both 0Else
0 -1-1 0 -1 1-1 -1 X X -1 0
00111 𝑎01111 𝑏
010011 𝑑01000 𝑧
11010 𝑠
● Comparison: CSA vs. RBA
2.3.2 Redundant Binary Adder (RBA)
6/3/2019 30Selected Topics of VLSI Design
Carry Save Adder Redundant Binary AdderFrom 2‘s C direct conversion, no HW needed
To 2’s C add carry and sum vectors(C+S)
split SD-number in positive (1, 0) and negative (1, 0) subtraction
Functionality
CS-cell = FA = (3,2)-compressoradds CS+2C or 2C+2C+2C,
cascaded cell (4:2)-compressor for CS+CS
RB-cell = (4,2)-celladds RB+RB
Complexity~equal at same functionality
22 transistors (3:2),available in libraries
42 transistors (4:2),availability depends on library
2.3.2.1 Overflow in Redundant Binary Adders
● There are real overflow situations, but● also pseudo overflow
example:
● Overflow depends on MSD of sum 𝑠 and intermediate carry 𝑑 :
2.3.2 Redundant Binary Adder (RBA)
6/3/2019 31Selected Topics of VLSI Design
111 𝑎 1111 𝑏 1
1111 𝑑 20000 𝑧
110 𝑠 6
𝑑 𝑑 𝑑 … 𝑑𝑠 𝑠 … 𝑠 𝑠
● Pseudo overflow prevented by correction rule for sn-1:
● Can be implemented without speed loss in MSD of RBA● Real overflow needs to be handled on system or algorithmic level● Potential overflow
o needs detection on lower bits, oro limit magnitude of all numbers to < 2n-2, oro increase word length by one
2.3.2.1 Overflow in Redundant Binary Adders
6/3/2019 32Selected Topics of VLSI Design
dn sn-1 Overflow s‘n-1
1 -1 Pseudo 11 0 Potential1 1 Real avoid or
use saturation-1 -1 Real
-1 0 Potential-1 1 Pseudo -10 X no sn-1
● Summation of 3 or more 𝑛-bit operands ● Result requires non-redundant bits
2.4.1 Multi-operand addition using adder array
a) Linear array of CPAs (example: 4-operand RCA)
2.4 Multi-Operand Adder
6/3/2019 33Selected Topics of VLSI Design
𝑚 3𝑛 𝑙𝑜𝑔 𝑚
FA
a0,n-1
FA
FAFA
a2,n-1
a3,n-1
a1,n-1
sn-1sn
FA
a0,2
FA
FA
a2,2
a3,2
a1,2
s2
FA
a0,1
FA
FA
a2,1
a3,1
a1,1
s1
HA
a0,0
HA
HA
a2,0
a3,0
a1,0
s0
(m-1)-CPAs
CPA 1
CPA 2
CPA 3
b) Linear array of CSAs and final CPA (example: RCA)
2.4.1 Multi-operand addition using adder array
6/3/2019 34Selected Topics of VLSI Design
2.4.1 Multi-operand addition using adder array
6/3/2019 35Selected Topics of VLSI Design
● Evaluation:o same delay for a) and b)o buto Type a): fast final CPA (e.g. CSum) has to wait for operand arrival,
delay iso Type b): delayfor high performance always use b), type a) is expensive/useless
● Generic scheme for b):
𝑂 𝑛 𝑚𝑂 𝑚 𝑙𝑜𝑔 𝑛
CSA1
A0
2's C
A1 A2 A3 A4
CSA2
CSA3
CPA
𝐴 𝑚 2 · 𝐴 𝐴
𝑇 𝑚 2 · 𝑇 𝑇
𝐴 𝑂 𝑚 · 𝑛 𝑛 · 𝑙𝑜𝑔 𝑛
𝑇 𝑂 𝑚 𝑙𝑜𝑔 𝑛
For logarithmic CPA:
● Idea:o one column (2.4.1 b) without terminating CPA
compresses m input bit to 2 output bit propagates (m-3) carries to left-hand column
● No horizontal carry propagation ● Uses FAs = (3:2)-compressor or (4:2)-cell in linear array or tree-
structureo Example: 4-operand adder with (4:2)-adders
2.4.2 (m:2)-compressors
6/3/2019 36Selected Topics of VLSI Design
𝐴 7 ∗ 𝑚 2
𝑇 4 ∗ 𝑚 2
𝑇 6 ∗ 𝑙𝑜𝑔 𝑚 1
● Implementation of (4:2)-adders:
2.4.2 (m:2)-compressors
6/3/2019 37Selected Topics of VLSI Design
FA
FA
a0 a1
a bcincout
s
a2 a3
cin
s
a bcincout
s
cout
c
0 1cout
0 1
C S
cin
a1a0 a2 a3
𝐴 14𝑇 8
𝐴 16𝑇 6
2 full adders: Optimized structure using tree of XOR gates:
● Advantages of (4:2) versus (3:2) in (m:2) compressorso 4:2 instead of 3:2 (sic!)o reduced deptho regular layout
● Example: (8:2)-compressor:
2.4.2 (m:2)-compressors
6/3/2019 38Selected Topics of VLSI Design
● Using n-bit m-operand redundant adders● Tree-structure● Each adder consists of n-bit (m:2)-compressors● Fastest multi-operand adders:
o adder tree + log2(n)-CPA
2.4.3 Multi-operand addition using adder trees
6/3/2019 39Selected Topics of VLSI Design
𝐴 𝐴 , · 𝑛 𝐴 𝑂 𝑚 · 𝑛 𝑛 · 𝑙𝑜𝑔 𝑛
𝑇 𝑇 , 𝑇 𝑂 𝑙𝑜𝑔 𝑚 𝑙𝑜𝑔 𝑛
● Wallace Tree (1964)o Redundant adder = CSA (3:2)
● Trees are faster than arrays with same number of gates● But: trees require irregular wiring
increased area
2.4.3 Multi-operand addition using adder trees
6/3/2019 40Selected Topics of VLSI Design
4:2 4:2 4:2 4:2
4:2
4:2
4:2
3:2 3:2
3:2
3:2
3:2
Wallace Tree: (4:2)-Tree:
● Bitwise adding of 2 n-bit numbers, starting from LSB ● Pros:
o Smallo Serial communicationo Cascadable (LSB-In LSB-Out)
● Cons:o Needs temporary storage flipflopo Latency: n cycles
2.5 Sequential Adders
6/3/2019 41Selected Topics of VLSI Design
2.5.1 LSB-first serial adder
𝐴 𝐴 𝐴
𝑇 𝑇 𝑇
𝐿 𝑛 · 𝑇
● Bitwise adding of 2 n-bit numbers, starting from MSD● Seems impossible, but can be derived from parallel (4:2)-adders in CS
(SD as well):
● ai, bi, ai-1, bi-1, ai-2, bi-2 must be known to compute si
● Thus, this “Digit Online Addition” has an online-delay of 𝛿 2
2.5.2 MSD-first serial adder (digit online arithmetic)
6/3/2019 42Selected Topics of VLSI Design
● Comparison to LSB-first adder:o Needs conversion to 2’s complemento More wiring (2 wires/digit)o Slower: 𝛿 2
● Why digit online technique?o Add, Sub, Mult are “natural” LSB-first-In/Out operationso But: Division and more complex functions are MSD-first-In/Out
all n input digits have to be known LSB-first-In wait for n cycles MSB-first-Out
o Digit online better suited for mixed and concatenated operations of Add, Div, Log, Sub… all operations can be performed MSD-first, but not using LSB-first MSD-first-In wait ∑ 𝛿 cycles MSD-first-Out Lower overall latency (even than for parallel operations!) is possible
due to overlapping input and output digits But: throughput typically lower than with parallel operations
2.5.2 MSD-first serial adder (digit online arithmetic)
6/3/2019 43Selected Topics of VLSI Design
● Online delays 𝛿 of basic operations
2.5.2 MSD-first serial adder (digit online arithmetic)
6/3/2019 44Selected Topics of VLSI Design
● adds m n-bit operands in parallel ● a)
● b)
b) much faster when using a pipelined CPA
2.5.3 Accumulator
6/3/2019 45Selected Topics of VLSI Design
𝐴 𝐴 𝐴
𝑇 𝑇 𝑇
𝐿 𝑚 · 𝑇
𝐴 𝐴 𝐴 𝐴
𝑇 𝑇 𝑇
𝐿 𝑚 · 𝑇
● Increment / Decrement● Counter (feedback increment)● Comparators ( , , , ⋯)
o TC, SD, CS: 𝑇 𝑙𝑜𝑔 𝑛● Detect leading zeroes
o TC, SD, CS: 𝑇 𝑙𝑜𝑔 𝑛● Determine flag bits in processors
2.6 Adder-based Operations
6/3/2019 46Selected Topics of VLSI Design
Institute ofApplied Microelectronics & Computer Engineering
Selected Topics of VLSI Design
Part 3: Multiplication
Prof. Dr.-Ing. Dirk Timmermanndirk.timmermann@uni-rostock.de
● 3.1 Fundamentalso Unsigned Multiplication, 2’s Complement Multiplication
● 3.2 Unsigned Braun-Array Multiplier
● 3.3 Signed Pezaris-Array Multiplier
● 3.4 Booth Multiplier
● 3.5 Booth-Wallace Multiplier
● 3.6 Evaluation
Outline
5/6/2019 Selected Topics of VLSI Design 2
● Like paper-and-pencil multiplication● Multiplication of 2 n-bit operands A and B yields 2𝑛-bit product
3.1 Fundamentals
5/6/2019 3Selected Topics of VLSI Design
3.1.1 Unsigned Multiplication
● Multiply algorithm:1) Generate n partial products 𝑃2) Sum up all partial products 𝑃
Shift-and-Add
𝑃 𝐴 · 𝐵 𝑎 2 · 𝑏 2 𝑎 𝑏 · 2
𝑃 𝑎 · 𝐵 , 𝑃 𝑃 2
(see 1.6.2.2 Recursive, associative function)
Note:𝑃 Product𝑃 Partial product𝑝 Bit 𝑖 of product
a) Recursive (shift-and-add) using one accumulator
b) Serial (shift-and-add) using linear array of CSAso All pi are generated in parallel
5/6/2019 4Selected Topics of VLSI Design
3.1.1 Unsigned Multiplication (cont’d)
Reg
ai
B
P
CPACLK
i = 0, ..., n ‐1 Shift left by i bits
1n
2n
*Metric:𝐴 𝑂 𝑛 ⋅ log 𝑛𝑇 𝑂 log 𝑛𝐿 𝑛
CSA
CSA
CSA
CSA
CPA
*
*
*
*
a0
a1
a2
a3
A
B
4n inputof CPA
Carry and sum
2n
Metric:𝐴 𝑂 𝑛𝑇 𝑂 𝑛 log 𝑛
CPA
c) Parallel using multi-operand adder (tree-structure)
3.1.1 Unsigned Multiplication (cont’d)
5/6/2019 5Selected Topics of VLSI Design
CPA
*
**A
B
*
CSA ‐ Tree
2n
2n
2n
a0a1a2a3
Metric:𝐴 𝑂 𝑛𝑇 𝑂 log 𝑛
● Option 1 o Complement operands before and result after multiplication Unsigned multiplication algorithm applicable
● Option 2o Use dedicated two’s complement multipliers e.g., Braun, Pezaris, Baugh-Wooley
3.1.2 Two’s Complement Multiplication
5/6/2019 6Selected Topics of VLSI Design
● E.g., for 4-bit operands
3.2 Unsigned Braun-Array Multiplier
5/6/2019 7Selected Topics of VLSI Design
a0b3 a0b2 a0b1 a0b0
a1b3 a1b2 a1b1 a1b0
a2b3 a2b2 a2b1 a2b0
a3b3 a3b2 a3b1 a3b0
p7 p6 p5 p4 p3 p2 p1 p0
Metric:𝐴 8𝑛 11𝑛 𝑂 𝑛𝑇 6𝑛 9 𝑂 𝑛
ai
bi
pi
Braun ai
bi
pi (MSBs)pi (LSBs)
● 4-bit Braun-Array multiplier
3.2 Unsigned Braun-Array Multiplier (cont’d)
5/6/2019 8Selected Topics of VLSI Design
b0
FA FA FA
FAFAFA
FA FA FA
HA HA HA
b3 b2 b1
CPA
p0
p1
p2
p3
p4p5p6p7
a0
a1
a2
a3
CSA
2
1
3
● Modified Braun-Array multiplier, here shown for 4-bit operands● MSB = sign bit value = -1
3.3 Signed Pezaris-Array Multiplier
5/6/2019 9Selected Topics of VLSI Design
-a0b3 a0b2 a0b1 a0b0
-a1b3 a1b2 a1b1 a1b0
-a2b3 a2b2 a2b1 a2b0
a3b3 -a3b2 -a3b1 -a3b0
p7 p6 p5 p4 p3 p2 p1 p0
● Four cases for partial product Pi
a) 3 pos. operands regular FA
b) 2 pos., 1 neg. operandsoo Weight of sum-bit: -1o Weight of cout: +2
c) 1 pos., 2 neg. operandsoo Weight of sum-bit: +1o Weight of cout: -2
d) 3 neg. operands logically identical to a) identical implementation: regular FA
3.3 Signed Pezaris-Array Multiplier (cont’d)
5/6/2019 10Selected Topics of VLSI Design
𝑎 𝑏 𝑐
𝑎 𝑏 𝑐 2𝑐 𝑠
1 𝑠𝑢𝑚 2
2 𝑠𝑢𝑚 1
𝑎 𝑏 𝑐 2𝑐 𝑠
● b) and c) have same implementation
● Approach: replace FA in regions , , and with modified FA (input a = •)
● Same structure like Braun multiplier (except modified FA)
3.3 Signed Pezaris-Array Multiplier (cont’d)
5/6/2019 11Selected Topics of VLSI Design
𝑠 𝑎 ⊗ 𝑏 ⊗ 𝑐𝑐 𝑎 ∧ 𝑏 ∨ 𝑎 ∧ 𝑐 ∨ 𝑏 ∧ 𝑐
(regular FA)(modified FA)
b0
FA FA FA
FAFAFA
FA FA FA
HA HA HA
b3 b2 b1
CPA
p0
p1
p2
p3
p4p5p6p7
a0
a1
a2
a3
CSA
2
1
3
● Observation: multiplication delayo For every 0 in ai one row can be omitted in array!o Recoding of ai to maximize number of 0’s
(𝑎 ∈ 0,1 → 𝑎 ′ ∈ 1,0,1 )● Two possibilities:
a) ai always constant: CSD-Recoding (1/3 of area on average) b) ai variable: modified Booth-Encoding (1/2 of area)
Booth Multiplier
● Note: “horizontal” data compression can be achieved with Dadda-multiplier (Booth = “vertical” compression)
3.4 Booth Multiplier
5/6/2019 12Selected Topics of VLSI Design
𝑓 ⋕ partial products 𝑃 𝑓 𝑛
*
CSA - array
CPA
Mod
.Boo
th-
Rec
odin
g
Parallel calculation
ai
bin
n/2 partial products Pi*
**
ai‘ Metric:𝐴 𝑂 𝑛𝑇 𝑂 𝑛 log 𝑛
● take Booth multiplier and replace CSA-array with Wallace-tree (see 2.4.3)
3.5 Booth-Wallace Multiplier
5/6/2019 13Selected Topics of VLSI Design
Metric:𝐴 5 … 6𝑛𝑇 𝑂 𝑙𝑜𝑔 𝑛 ; → 𝑇 2 · 𝑙𝑜𝑔 𝑛
CSA tree CPA
3.6 Evaluation of multiplier architectures
06.05.2019 Selected Topics of VLSI Design 14
Trough-put Latency Area Regularity Pipelining
Recursive - - o ++ - (control needed) - -Braun + o o ++ ++Booth + + o + +Booth-Wallace + ++ - - - +
Institute ofApplied Microelectronics & Computer Engineering
Selected Topics of VLSI Design
Part 4: Division
Prof. Dr.-Ing. Dirk Timmermanndirk.timmermann@uni-rostock.de
● 4.1 Definitions
● 4.2 Fundamentals
● 4.3 Restoring Division
● 4.4 Non-Restoring Division
● 4.5 SRT Division
● 4.6 Multiplicative Division
● 4.7 Evaluation
Outline
5/28/2019 Selected Topics of VLSI Design 2
(avoid overflow: pre-normalize B and A)
4.1 Definitions
5/28/2019 3Selected Topics of VLSI Design
𝑅 𝐵 ; 𝑅 𝐴 𝑚𝑜𝑑 𝐵𝐴𝐵 Q
𝑅𝐵 → A Q · 𝐵 𝑅
𝐴 ∈ 0, 2 1
𝐵, 𝑄, 𝑅 ∈ 0, 2 1 , B 0
Q 2 → 𝐵 ∈ 2 , 2 1
→ 𝐴 2 · 𝐵
5/28/2019 4Selected Topics of VLSI Design
4.2 Fundamentals (cont’d)
● Like paper-and-pencil division, dividend : divisor = quotient● Steps:
a) Compare left shifted divisor with dividendb) Subtract conditionally to get partial remainderc) Go to a) with partial remainder as dividend
Subtract-and-shift algorithm
● Decimal example:
Sequential, not associative,no parallelism
0,75: 0,875 750: 875 0,857 7500 70000 50000 437500 625000 6125000 1250
A B qi
Ri
5/28/2019 5Selected Topics of VLSI Design
4.2 Fundamentals (cont’d)
● Basic algorithm for all subtract-and-shift division algorithms
● Division methods differ in selecting qi and if redundant adders are used !
𝑞 𝑅 2 𝐵 𝑅 𝑅 𝑞 2 𝐵
𝑖 𝑛 1, … , 0
InitializationRemainder after iteration
a) b)
c)
𝑅 𝐴𝑅 𝑅
!
● e.g.:● index i:
● index i-1:
If remainder is too small for divisor, the current iteration result (𝑅 𝐵2 ) is discarded. Instead, next ‘0’ is appended (identical to shifting divisor one position lesser to MSB)
4.3 Restoring-Division
5/28/2019 6Selected Topics of VLSI Design
𝑞 1 iff 𝑅 𝐵2 00 iff 𝑅 𝐵2 0
𝑞 ∈ 0,1
𝑅 𝐵2 0 → 𝑞 0 ; 𝑅 𝑅
𝑅 𝐵2 0 → 𝑞 1 ; 𝑅 𝑅 𝐵2
!
! (cf. 4.4)
● Two implementation options in case 𝑞 0 :
1.
2.
Option 2 preferable: subtract in each case and restore from register ifnecessary
4.3 Restoring-Division (cont’d)
5/28/2019 7Selected Topics of VLSI Design
𝑅 𝑅 𝐵 · 2
𝑅 ′ 𝑅 𝐵 · 2
𝑅 𝑅 𝐵 · 2
𝑅 ′ 𝑅
save 𝑅 before subtraction
“restoring” with additional mux and register
requires addition to restore 𝑅
● index i:
● index i-1:
● Note:
Evaluate sign, subtract or add, correct by addition in next steps until partial remainder is positive again, identical red and green terms and algebraically equivalent q in 4.3 and 4.4 show identity of both methods
4.4 Non-Restoring Division
5/28/2019 8Selected Topics of VLSI Design
𝑞 ′1 iff 𝑅 01 iff 𝑅 0 𝑞 ′ ∈ 1, 1
𝑅 0 → 𝑞 1 ; 𝑅 𝑅 𝐵 · 2
𝑅 𝐵2 0 → 𝑞 ′ 1 ; 𝑅 𝑅 𝐵2 𝐵2 𝑅 𝐵2
𝑞 𝑞 01 𝑞 𝑞 11
!
!(cf. 4.3)
● Conversion of Q’ = (qn-1’, … q0’) to Two’s complement representation:
Q = (𝑞n 1,qn 2, … , q0, 1) Q’ is not redundant, no CPA req’d
4.4 Non-Restoring Division (cont’d)
5/28/2019 9Selected Topics of VLSI Design
𝑞 ∈ 1,1 → 𝑞 ∈ 0,1 → 𝑞 0 if 𝑞 1
1 if 𝑞 1
≥0≥0
≥0≥0
CPACPA
CPACPA
Q‘
Ri
A B
Correction of R
● Implementation:o For sign detection non-redundant adder is mandatoryo Last remainder needs to be corrected
Metric:𝐴 𝑛 1 · 𝐴 O(n2)...O(n2log2(n))𝑇 𝑛 1 · 𝑇 O(n2)…O(n log2(n))
CPA = RCA CLA
● Extension to signed 2’s complement division:
● Example: 2’s complement array divider (B>0, no correction of R)o XOR gates for sign evaluation
o Partial remainder Ri would tend to 0. o Note: Ri is kept in about the same range during iteration. Thus,
rounding errors are reduced.
4.4 Non-Restoring Division (cont’d)
5/28/2019 10Selected Topics of VLSI Design
𝑞 ′1 iff 𝑅 , 𝐵 have same sign
1 iff 𝑅 , 𝐵 have different sign 𝑞 ∈ 1, 1
𝑋𝑂𝑅 0 → different signs → 𝑅 𝑅 ⋯ 1 → identical signs → 𝑅 𝑅 ⋯
𝑅 𝑅 · 2 𝑞 · 2 · 𝐵
bi 𝑎6⊕b3
o a2, a1, a0 are fetched consecutively
● Shifted array of CAS cells (Controlled Adder/Subtractor)o XOR gates included in CAS cells
4.4 Non-Restoring Division (cont’d)
5/28/2019 11Selected Topics of VLSI Design
● Sweeney, Robertson, Tocher (~1958)● Use redundant adders● Problem: Fast detection of sign in redundant number, without:
o Evaluation of all digits oro Conversion to 2’s complement
● Example:o 00011𝑋𝑋 no sign detection from MSD, same problem for CS-
numbers
● Solution: Evaluate a few leading digits of partial remaindero If 0: number is small enough to assume 𝑞 0 (without
diverging iteration)o Else: similar to non-restoring division
4.5 SRT-Division
5/28/2019 12Selected Topics of VLSI Design
● appropriate scaling of B yields
● 3 MSDs of Ri are sufficient for determination of qi’● Nevertheless, convergence is assured ( use redundant adder instead
of CPA)● Qi’ needs conversion: SD to 2’s complement by using CPA for qi’
4.5 SRT-Division (cont’d)
5/28/2019 13Selected Topics of VLSI Design
2 𝐵 2𝐵 · 2 2 𝑅 2 𝐵 · 2
𝑞101
iff 𝐵 · 2 𝑅
𝐵 · 2 𝑅 𝐵 · 2 𝑅 𝐵 · 2
𝑞101
if 2 𝑅
2 𝑅 2 𝑅 2
● Implementation:
Just a little slower than array multiplication State-of-the-art division method
4.5 SRT-Division (cont’d)
5/28/2019 14Selected Topics of VLSI Design
Conversion to 2's comp.
CSACSA
CSA
CPA
CPA
Q
R
A B
+‐
+‐+‐
+‐
Conversion to 2's comp.
≥0
CSA+‐
≥0≥0
≥0q i
' redundant
Metric:𝐴 𝑛 · 𝐴 2𝐴 𝑂 𝑛𝑇 𝑛 · 𝑇 𝑇 𝑂 𝑛 log 𝑛
● So far o Add/sub as basic functionso Execute n times 𝑇 𝑂 𝑛o Linear convergence +1 valid bit per iteration
● Nowo Mult as basic function (Goldschmidt 1964, used in IBM 360)o Execute log 𝑛 -times 𝑇 𝑂 log 𝑛o Quadratic convergence doubles valid bits per iteration
● Algorithm
4.6 Multiplicative Division
5/28/2019 15Selected Topics of VLSI Design
Metric:𝐴 𝑂 𝑛 1 Mult only 𝑇 𝑂 log 𝑛
𝑄𝐴𝐵
𝐴 · 𝑅 · 𝑅 … 𝑅𝐵 · 𝑅 · 𝑅 … 𝑅
𝑄 𝐴 · 𝑅
Choose 𝑅 so thatconverges to 1
𝐵 · 𝑅 … 𝑅
● Sequential dividerso One add/sub-unit as hardwareo Low areao Low throughput
● Array dividerso n add/sub-unit as hardwareo High area, but regular designo High throughput
● Multiplicative dividerso Reuse of available multipliero Very fast for large n
4.7 Evaluation
5/28/2019 16Selected Topics of VLSI Design
Institute ofApplied Microelectronics & Computer Engineering
Selected Topics of VLSI Design
Part 5: Elementary Functions
Prof. Dr.-Ing. Dirk Timmermanndirk.timmermann@uni-rostock.de
● 5.1 Examples and Classification of Algorithms
● 5.2 CORDICo vector rotation, generalization, architectures, redundant numbers
Outline
5/28/2019 Selected Topics of VLSI Design 2
5.1 Examples and Classification of Algorithms
5/28/2019 3Selected Topics of VLSI Design
●● Some elementary functionso Logo o ex
o xy
o Sino Coso Atano Cosho ….
5/28/2019 4Selected Topics of VLSI Design
5.1 Examples and Classification of Algorithms
● ROMo 𝐴 𝑂 𝑛 · 2 o 𝑇 ? → slowo Restricted to functions with one operand and n ≤ 20..24 bito Hard to pipeline
● Polynomialo Taylor Serieso Chebyshev Series
𝐴 𝑂 𝑛 𝑇 ? better convergence less terms Hard to pipeline Excellent for software and big n
5/28/2019 5Selected Topics of VLSI Design
5.1 Examples and Classification of Algorithms
● Alternative number systemso Logarithmic systemso Residual systemso Problem: conversion between systems
● Iterationo Newton-Raphson
cf. multiplicative division)o Digit-by-Digit method
cf. Paper-and-pencil division, SRT division CORDIC
Conversion
Conversion
2's complement
Calculation
2's complement
Alternative number system
5/28/2019 6Selected Topics of VLSI Design
5.2 CORDIC
● COordinate Rotation DIgital Computero [Volder 1959, Walther 1971]
● Given: 𝑥 , 𝑦 , 𝜃 ● Wanted: 𝑥, 𝑦
● Use Matrix for rotation in Euclidean Space:
5.2.1 Vector Rotation
𝑥 𝑥 · cos 𝜃 𝑦 · sin 𝜃𝑦 𝑥 · sin 𝜃 𝑦 · cos 𝜃
𝜃𝜃 𝑥, 𝑦
𝑥, 𝑦 𝑥 , 𝑦
5/28/2019 7Selected Topics of VLSI Design
5.2.1 Vector Rotation (cont’d)
● Transformation
● Elementary rotation angle
● Iteration
𝑥 cos 𝜃 · 𝑥 𝑦 · tan 𝜃 𝐾 · 𝑥 𝑦 · tan 𝜃𝑦 cos 𝜃 · 𝑥 · tan 𝜃 𝑦 𝐾 · 𝑥 · tan 𝜃 𝑦
𝜃 arctan 2 → tan 𝜃 2
𝑥 𝐾 · 𝑥 2 · 𝑦𝑦 𝐾 · 𝑦 2 · 𝑥
𝐾1
1 tan 𝜃
5/28/2019 8Selected Topics of VLSI Design
5.2.2 Decomposition of Rotation in Elementary Rotation Angles
● 𝜃 ∑ 𝜎 · 𝜃 depending on direction of rotation 𝜎 ∈ 1, 1● Compute 𝜎 using successive Sub/Add (pseudo division)● Example:
i 𝜃𝒊 arctan 𝟐 𝒊
0 45.0°1 26.5°2 14.03°3 7.1°4 3.5°
Iteration i Angle Sign 𝜎𝒊
0 𝜃 𝜃 77° Positive σ 11 𝜃 77° 45° 32° Positive σ 12 𝜃 32° 26,5° 5,5° Positive σ 13 𝜃 5,5° 14,03° 8,53° Negative σ 1
𝜃 77° 45° 26,5° 14,03° 7,1° …
5/28/2019 9Selected Topics of VLSI Design
5.2.2 Decomposition of Rotation in Elementary Rotation Angles
● Iteration
● What about 𝐾 ?
● After n iterations (without considering 𝐾 ) vector magnitude is “stretched” by 𝐾
multiply 𝑥 and 𝑦 by known scaling factor to correct magnitude after final iteration
CSD encoding of possible
𝑧 𝜃, 𝑧 𝑧 𝜎 · arctan 2
𝜎 11 for
𝑧 0𝑧 0
Goal of iteration:
𝐾1
1 tan 𝜃1
1 2
𝑦1𝐾 · 𝑦
𝑥1𝐾 · 𝑥
𝐾 is a constant !
5/28/2019 10Selected Topics of VLSI Design
5.2.3 Modes of Operation: Rotation and Vectoring
● “Rotation” modeo 𝑧 𝜃, iteration goal 𝑧 → 0
● “Vectoring” modeo 𝑧 0, iteration goal 𝑦 → 0
𝑥 𝑥 𝜎 · 2 · 𝑦𝑦 𝑦 𝜎 · 2 · 𝑥𝑧 𝑧 𝜎 · arctan 2
𝜎 11 for
𝑦 0 𝑦 0
𝑥 𝑥 · cos 𝜃 𝑦 · sin 𝜃𝑦 𝑥 · sin 𝜃 𝑦 · cos 𝜃
𝜎 11 for
𝑧 0𝑧 0
5/28/2019 11Selected Topics of VLSI Design
5.2.4 Generalization for other Coordinate Systems [Walther]
● Vector magnitude 𝑅 𝑅 · 1 𝑚 · 𝜎 · 2
● 𝛼 , arctan 𝑚 · 2
o 𝛼 , arctan 2o 𝛼 , 2o 𝛼 , artanh 2
𝑥 𝑥 𝑚 · 𝜎 · 2 · 𝑦𝑦 𝑦 𝜎 · 2 · 𝑥𝑧 𝑧 𝜎 · 𝛼 ,
with 𝑚 1 → trigonometric circular 0 → linear
1 → hyperbolic
5/28/2019 12Selected Topics of VLSI Design
5.2.5 Overview of CORDIC FunctionsMode m Rotation (𝒛𝒊 → 𝟎) Vectoring (𝒚𝒊 → 𝟎)
circular𝑚 1
linear𝑚 0
hyperbolic𝑚 1
𝑥𝑦𝑧
𝑥 cos 𝑧 𝑦 sin 𝑧𝑦 cos 𝑧 𝑥 sin 𝑧0
𝑥𝑦𝑧
𝑥𝑦 𝑥 · 𝑧0
𝑥𝑦𝑧
𝑥 cosh 𝑧 𝑦 sinh 𝑧𝑦 cosh 𝑧 𝑥 sinh 𝑧0
𝑥𝑦𝑧
𝑥 𝑦0𝑧 arctan
𝑦𝑥
𝑥𝑦𝑧
𝑥0𝑧
𝑦𝑥
𝑥𝑦𝑧
𝑥 𝑦0
𝑧 artanh 𝑦𝑥
● 𝑒 , 𝑙𝑜𝑔 𝑥 , ln 𝑥 computable using angle sum identities● CORDIC provides a nearly universal method for evaluation of
elementary functions, yielding one bit accuracy per iteration
5/28/2019 13Selected Topics of VLSI Design
5.2.6 Architectures
● Small area, needs control logic● low throughput, no pipelining● Variable shift needed large barrel shifter
5.2.6.1 Recursive
Metric:𝐴 ≅ 3𝑛𝑇 𝑛 · log 𝑛
5/28/2019 14Selected Topics of VLSI Design
5.2.6.2 Pipeline
● Barrel shifter replaced by hard wiring● ROM hard wiring
o (3:1)-MUX for 3 cases of m● High throughput (times n)● High area (times n)
Metric:𝐴 ≅ 3𝑛𝑇 𝑛 · log 𝑛
5/28/2019 15Selected Topics of VLSI Design
5.2.6.3 Array
● Low latency● Low throughput
5/28/2019 16Selected Topics of VLSI Design
5.2.7 CORDIC and Redundant Number Systems
● Motivation: Avoid carry propagation during addition
● Issue 1: sign detection of redundant numberso approx. of 𝜎 by looking at 𝑝 first significant digits 𝑧 or 𝑦 (p ≪ 𝑛)o similar to SRT-Division
𝜎 ∈ 1,0, 1o But note
● Issue 2: variable scaling for 𝜎 0
𝐾1
1 𝑚 · 𝜎 · 2Until now: 𝜎 ∈ 1, 1 , 𝑚 ∈ 1,0, 1
5/28/2019 17Selected Topics of VLSI Design
5.2.7 CORDIC and Redundant Number Systems (cont’d)
● Solutions to avoid variable scaling when 𝜎 0o Constant scaling
Defined direction of rotation when 𝜎 0; e.g., 𝜎 1 Small error
After defined number of iterations repeat iteration E.g. 𝑝 3 → repeat each 5th iteration
Convergence guaranteed Standard method
o Double rotation Instead of rotation by 𝛼 do two rotations by ~ (→arctan(2 ))
Scaling factor is constant, double rotations twice area/timeo True variable scaling factor multiplier needed
𝜎 1 →0 →
2 · arctan 2 arctan 2 arctan 2
Recommended