16
Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Embed Size (px)

Citation preview

Page 1: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Implementation of MAC Assisted CORDIC engine on FPGA

EE382N-4 Abhik Bhattacharya

Mrinal DeoRaghunandan K R

Samir Dutt

Page 2: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Motivation• The TLL 5000 Freescale i.MX21 System-on-Chip

ARM9-based processor does not have native support for Floating Point

• Floating point operation simulated using libraries e.g libc• Applications which are “Math Heavy” e.g MAC based

operations which require computing sine/cos/arctan values are thus not suitable for this platform.

EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 2

Hardware Acceleration for Trigonometric Math

operations

Page 3: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Outline • Select a basic mathematical building block. E.g

CORDIC (from OpenCores)• Implement the CORDIC engine in hardware (FPGA). • Implement higher level primitives e.g Discrete

Fourier Transform, using CORDIC.• Use these blocks in a C program instead of the

<math.h>. • Offload the heavy number crunching to the

hardware accelerator (FPGA) freeing up valuable CPU resources.

EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 3

Page 4: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

CORDIC engine

EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 4

• Coordinated Rotation Digital Computer is simple and efficient algorithm to calculate hyperbolic and trigonometric functions.

• We use it to calculate Sine and Cosine of an angle given in Radians/Degrees .

• To determine the Sine and Cosine of angle β we need to find the position X and Y on the unit circle.

Page 5: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

CORDIC contd.• CORDIC is an iterative algorithm and used table lookup. • First Step: Rotate the vector 45° counterclockwise. • If ((β – α) != 0)

iterateElse

exit.• Successive iteration will rotate the vector in one or the other

direction in size decreasing steps. • The magnitude of rotation is 1/2i.

– Where “i” is the iteration step. • Terminate after 16 steps. (approximate 5 digits of

precision)EE382N-4 Abhik Bhattacharya,Mrinal Deo,

Raghunandan R.K, Samir Dutt 5

Page 6: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Discrete Fourier Transform(DFT)

EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 6

DFT can be implemented using CORDIC

Page 7: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Design of CORDIC• The CORDIC Verilog from OpenCores could be operated in

different modes– Pipelined– Iterative– Combinatorial

• Pipeline Efficient from performance perspective. We trade off area for performance. (max number of LUT needed)– Outputs result at every clock after an initial latency.

• Resolution limited to 5 bits of precision• Algorithm works in the 1st Quadrant of the unit circle.

Appropriate logic added to take care of the polarityEE382N-4 Abhik Bhattacharya,Mrinal Deo,

Raghunandan R.K, Samir Dutt 7

Page 8: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

MAC Implementation

• Pipelined CORDIC gives Sin/cosine values in every cycle if we can maintain steady inflow of inputs.

• Can implement a MAC based engine based on this CORDIC functionality.

• Useful in Linear Time variant Control Systems where the coefficients may be sine/cosine values which need to be computed & accumulated

• Simple example: Discrete Fourier Transform

EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 8

Page 9: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Design of DFT• 32 point of DFT implemented using CORDIC

based MAC.• Samples sent to the board from the user

application. • Instantiated one copy of the Cordic based

MAC.• The design was pipelined to avoid any bubbles

providing new input (angle) to the CORDIC every cycle.

EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 9

Page 10: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Block Diagram of our System

EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 10

CORDIC

DFT

sin (θ) cos (θ)

(θ) CORDIC Gain

MAC Engine

Input Samples

DFT

Top Level

Page 11: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Operation of the System

• User Application writes the 32 data samples to the RAM followed by a “compute_dft” instruction.

• Data is read from the RAM by the DFT encoder in a pipeline.• Handshaking between two pipelined stages.

– MAC operation begins after a delay of 16 clks (initial latency of CORDIC pipeline). – 1st MAC output generated after N clocks after the initial Latency. (N == 32) is length of the

input sequence.– After MAC generates N output samples, the result of the N-point DFT is written to the

RAM module followed by an Interrupt.– User application reads the results from the RAM through the device driver on detection

of this interrupt.EE382N-4 Abhik Bhattacharya,Mrinal Deo,

Raghunandan R.K, Samir Dutt 11

User application writes i/p to RAM

Initial CORDIC latency

Time ------ >

MAC Operation begins 1st MAC output sample

N

Final o/p from MAC

Page 12: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Performance Measurements

EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 12

Page 13: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Issues Faced• Coding a aggressive pipeline (avoid bubbles) is always

a challenge. • Time consuming process – needs to be done in 2 steps

– Code and validate in ModelSim (signals available for debug)– Change the design to run in it on FPGA. Iterate for all

modules. • Design need to be aware of the memory timing issues

(e.g. – back-to-back writes from FPGA to RAM is a problem)

• Calculating the correct polarity of CORDIC output samples.

EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 13

Page 14: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Future scope

• Extending to 256 bit DFT.. Cannot extend to higher because resolution of CORDIC is low.. Need to increase cordic resolution

EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 14

Page 15: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Lessons Learnt

• Debug on FPGA is interesting!!

EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 15

Page 16: Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Thank You!!

• No Questions!!! Please!! :x :p

EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 16