vedic multiplier9

8.2.5 IEEE 32-Bit Floating point Multiplier

Figure 8.16 Black box view of IEEE 32 bit Floating point Multiplier

(i)Description

a Input data 32bit

b Input data 32 bit

out Output data 32 bit

flow Output data 1 bit

(ii) Device utilization summary:

Selected Device: xc6vlx75tl-1Lff784

Table8.5 Device utilization summary for IEEE 32-bit Floating point Multiplier

Logic utilization Used Available utilization

Number of slice LUTs 786 46,560 1%

Number of occupied Slices 345 11,640 2%

Number of bonded IOBs 97 232 41%

Average Fanout of Non-Clock Nets 4.14

Figure 8.17 RTL diagram of IEEE 32-bit floating point Multiplier

8.2.6 16 Bit Squarer

Figure 8.18 Black box view of 16 bit squarer

(i) Description

a Input data 16bit

b Input data 16 bit

r Output data 32 bit

(ii) Device Utilization Summary:

Selected Device :xc3s500e-5fg320

Table8.6 Device utilization summary for 16-bit Squarer

Logic Utilization Used Available Utilization

Number of 4 input LUTs 402 9,312 4%

Number of occupied Slices 221 4,656 4%

Number of Slices containing only related logic

221 221 100%

Number of Slices containing 0 221 0%

unrelated logic

Total Number of 4 input LUTs 402 9,312 4%

Number of bonded IOBs 64 232 27%

Average Fanout of Non-Clock Nets 3.70

Figure 8.19 RTL diagram of the 16 bit Squarer

8.3 LATENCY & THROUGHPUT OF MULTIPLIERS

Latency is obtained from maximum combinational path delay in the synthesis report and

through put is just the inverse of the Latency (in the case of combinational

Multiplier).below table8.7 and 8.8 shows the Latency and throughput of all the

Multipliers.

Table8.7 Latency and Throughput of the Multipliers

S.No 16 x 16 Multipliers Latency (ns) Throughput(Mhz)

1 Vedic Multiplier (U.T.S) 37.336(22.023logic, 15.313 route

26.783

2 Vedic Multiplier (N.S) 23.470(14.345logic, 9.125 route)

42.60

3 Modified Booth Multiplier

58.464(32.705logic, 25.759 route)

17.10

4 Array Multiplier 69.987(40.383logic, 29.60route) 14.28

Table 8.8 Latency and Throughput of IEEE 32 bit floating point multiplier and 16 bit

Squarer

S.No. Target device used

Applications using Vedic multiplier (UrdhvaTiryakbhyam)

Latency (ns) Throughput( Mhz )

1 xc6vlx75tl-1Lff784

IEEE 32 bit Floating point multiplier

21.719ns (1.808ns logic, 19.911ns route)

46.04

2 xc3s500e- 16 bit Squarer 37.606(22.301ns 26.59

5fg320 logic, 15.305ns route)

8.4 AREA AND POWER OF THE MULTIPLIERS

The area and power parameters can be understood from the number of LUT s and Slices

used in the design; lesser the number of LUT and Slices lesser is the area and power

dissipation. Number of slices and LUT s utilized are shown in the below table 8.9 for all

the Multipliers.

Table8.9 Slices and LUT s of all the Multipliers

S.No 16x16 Multipliers Slices used LUT s

1 Vedic Multiplier (U.T.S 409 731

2 Vedic Multiplier (N.S) 122 222

3 Modified Booth Multiplier

432 827

4 Array Multiplier 477 879

CHAPTER-9

CONCLUSIONS & FUTURE WORK

9.1 CONCLUSION

Through the analysis of multiplication Sutras of Vedic mathematics namely Urdhva

Tiryakbhyam and Nikhilam Sutras a new reduced-bit multiplication algorithms have been

proposed. Urdhva Tiryakbhyam and Nikhilam Sutras are such Algorithms which can

reduce delay, power and hardware requirements for multiplication of numbers.

16-bit Vedic Multiplier based on Urdhva Tiryakbhyam Sutra is efficient in design and

performance when compared to 16-bit Array & Modified Booth Multipliers.

16-bit Vedic Multiplier based on Nikhilam Sutra is an application specific multiplier

which performs large number multiplication effectively.

Designing an IEEE single precision floating point Multiplier using a Vedic Multiplier

based on Urdhva Tiryakbhyam Sutra makes it efficient for the use in various DSP

applications.

A 16-bit Squarer design using Urdhva Tiryakbhyam based Vedic Multiplier consumes

less hardware resources and reduces delay.

9.2 FUTURE WORK

Vedic Mathematics, developed about 2500 years ago, gives us a clue of symmetric

computation. Vedic mathematics deals with various topics of mathematics such as basic

arithmetic, geometry, trigonometry, calculus etc. All these methods are very efficient as

far as manual calculations are concerned.

If all those methods effectively implement hardware, it will reduce the computational

speed drastically. Therefore, it could be possible to implement a complete ALU using all

these methods using Vedic mathematics methods. By using these ancient Indian Vedic

mathematics methods world can achieve new heights of performance and quality for the

cutting edge technology devices.

APPENDIX -A

XILINX FPGA DESIGN FLOW

This section describes FPGA synthesis and implementation stages typical for Xilinx

design flow.

Fig A1 Xilinx FPGA Design flow [16]

Synthesis

The synthesizer converts HDL (VHDL/ Verilog) code into a gate-level netlist represented

in the terms of the UNISIM component library, a Xilinx library containing basic

primitives). By default Xilinx ISE uses built-in synthesizer XST (Xilinx Synthesis

Technology). Other synthesizers can also be used.

Synthesis report contains much useful information. There is a maximum frequency

estimate in the "timing summary" chapter. One should also pay attention to warnings

since they can indicate hidden problems.

After a successful synthesis one can run "View RTL Schematic" task (RTL stands for

register transfer level) to view a gate-level schematic produced by a synthesizer.

XST output is stored in NGC format. Many third-party synthesizers (like Synplicity

Synplify) use an industry-standard EDIF format to store netlist.

Implementation

Implementation stage is intended to translate netlist into the placed and routed FPGA

design.

Xilinx design flow has three implementation stages: translate, map and place and route.

(These steps are specific for Xilinx: for example, Altera combines translate and map into

one step executed by quartus_map.)

Translate

Translate is performed by the NGDBUILD program.

During the translate phase an NGC netlist (or EDIF netlist, depending on what

synthesizer was used) is converted to an NGD netlist. The difference between them is in

that NGC netlist is based on the UNISIM component library, designed for behavioural

simulation, and NGD netlist is based on the SIMPRIM library. The netlist produced by

the NGDBUILD program contains some approximate information about switching

delays.

Documents

vedic multiplier9