13
Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

Embed Size (px)

Citation preview

Page 1: Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

Distributed Arithmetic

Dr Sumam David S.

Dept. of E&C, NITK Surathkal

Courtesy for slides – Xilinx Professor’s Workshop Resources

Page 2: Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

Objective

Distributed arithmetic What ? Where ? How ?

Page 3: Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

What is DA?

Multiplication using LUT Used to implement multipliers in LUT rich

FPGAs

Page 4: Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

Twos Complement Multiplication

One bit at a time:

Page 5: Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

SDA 1-Tap FIR Filter

X0

PartialProductROM

A01

N BITS WIDESAMPLE DATA

+/- Z-1

Scaling Accumulator

LUT contains two locations

00000...0C0

A00

1

Parallelto serial converter

Page 6: Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

= Sign Extension

-23 22 21 20

C0 = 1 0 0 1 (-7)X0 = 0 1 1 1 ( 7)X

( 1 0 0 1 ( 1 0 0 1 ( 1 0 0 1 (0 0 0 01 1 0 0 1 1 1 1 (-49)

-23 22 21 20

C1 = 0 1 1 0 ( 6)X1 = 0 1 0 1 ( 5)X

0 1 1 0) 0 0 0 0 ) 0 1 1 0 ) 0 0 0 0 )0 0 0 1 1 1 1 0 ( 30)

1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 0 = 1 1 1 0 1 1 0 1

++++

(-1)(-14)(-4)(0)(-19)

(Serial-Data / Tap-Parallel Multiply)

Distributed Arithmeticfor a 2-Tap Filter

Partial products of equal weight are added together before being summed to next higher partial product weight

Create look-up table of summed partial products

Page 7: Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

SDA 2-Tap FIR Filter

LUT contains all possible sums of the partial products

00

01

10

11

0000...0C0

C0 + C1

C1

X0

X1

A0

A1

1

N BITS WIDESAMPLE DATA

Partial

Product

ROM+/- Z-1

Scaling Accumulator

Page 8: Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

0000...0C3

+

SDA 4-Tap FIR Filter

X0

0000...0C0

X1

A0

A1

N BITS WIDESAMPLE DATA

0000...0C1

+

+/- Z-1

Scaling Accumulator

1

X2

0000...0C2

X3

A2

A3

1

+Partial

Product

ROM

1

Page 9: Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

SDA 8-Tap FIR FilterN BITS WIDE

SAMPLE DATA

+ +/- Z-1

Scaling Accumulator

PartialProductROM

X0

X1

A0

A11

X2

X3

A2

A3

1

1

PartialProductROM

X4

X5

A0

A11

X6

X7

A2

A3

1

1 4 -input LUT contains all possible sums of the partial products

Pre-Adder

1

Page 10: Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

fclk = 200 MHz for both processor and FPGA

B = data sample precision for FPGA

Xilinx DA FIR Performance

0 50 100 150 200 2500

1000

2000

3000

4000

5000

6000

Filter Length (Taps)

Per

form

ance

(M

MA

Cs/

s)

Serial FPGA FIR

Dual MACDA FIR B=8DA FIR B=12DA FIR B=16

10

20

30

40

50

60

Sam

ple

Rat

e (M

SP

S)

Single MAC DA FIR B=8 DA FIR B=12DA FIR B=16

0 50 100 150 200 2500

Serial FPGA FIR

Filter Length (Taps)

Page 11: Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

The sample is serialized and processed 1 bit per clock cycle. 8 clock cycles are thus required to process the whole sample

The sample is serialized and processed 2 bitsper clock cycle. 4 clock cycles are thus required to process the whole sample

The sample is serialized and processed 4 bits per clock cycle

The sample is processed in parallel 8 bits per clock cycle

b0 b0

b0

b3

b4

b7

b3

b4

b7

b0

b0

b7

Serial-DA Parallel-DA

Multi bits per clock cycle

Trade Clock Cyclesfor Logic Area

20Ms/s 160Ms/s

Hardware Over-sampling = 8

b0

b7

HardwareOver-sampling = 1

Trade Clock Cycles for Logic Area

Hardware Over-sampling = 4

Hardware Over-sampling = 2

Page 12: Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

Conclusion

Efficiency of computation Slow as its bit serial Memory requirements

Page 13: Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

References

The role of Distributed Arithmetic in FPGA based signal processing, www.xilinx.com