Imlementation of ANN on FPGA

Chapter 1

1. INTRODUCTION1.1 Introduction to Artificial Intelligence and Neural Systems

The human body is arguably the most complex structure that can ever be conceived in the universe. So intricate is its construction and so complicated are its operations that it has been established that modern day super Computers are no match in terms of performance or organization. At the lowest level, its anatomy is comprised entirely of cells- billions of them which are microscopic in nature. Cells are clustered into tissue structures and groups of tissues shape an organ, resulting in basic organ systems. While each of these organ-systems is vital for existence, we confine our focus to the ‘prime controller’ within the human body-the nervous system.

1.1.1 The Nervous System

The nervous system is the primary controller of physiological operations in the human body. It is responsible for regulatory actions and serves as a network of communication channels within the human body. This system is essential to maintain homeostasis and facilitates us to be cognitive of our internal and external environments through its receptive sensors. Broadly, the nervous system is comprised of the brain, a myriad of brain cells (called neurons) and connecting nerves. The actions performed by this system can be classified into one of 3 types sensory, integrative and motor. Changes in internal or external environment (called ‘stimuli’) are sensed or detected by receptors, thus compiling sensory input data. This data is carried to the brain, which is the central controller of the human body. The process is donethrough nerve impulses, which are electrical signals. These signals carry information which enables the brain to decide the appropriate response that needs to be implemented. For example, this might be in the form of storage in memory, creation of thought, recollection of information stored previously or

1

evolving specific sensations. This continuous, on-going process integrates data from the sensory input with information stored in the brain to produce an appropriate response.

1.1.2 Artificial Intelligence and Intelligent Systems

It is not without reason that we call ourselves homo sapiens-‘man the wise’. Our mental capabilities are fundamental for our existence. The computational actions that are probably most imperative are the ones we take for granted, with little knowledge of their basis. The study of artificial intelligence deals with the fundamental operation of our own intelligence system, in an effort to comprehend its functioning, not merely to establish results but in fact to develop artificial systems capable of performing in a similar fashion. In this aim, it differs from most other studies of intelligence. While the task outlined seems to be certainly daunting, one might argue that other fields of science also raise the possibility of developing nearly-impossible phenomena. However, our efforts are encouraged considerably by the fact that we have existing proof of realizing our dream- something that is lacking in most other fields. Our very bodies are living proof that such intelligent systems are not impossible to create. We know what lies at the end; we need to use it to develop the means to get there. This is the foundation of artificial intelligence and the study of intelligent systems.

An intelligent system can be safely defined as a system capable of operating in an environment with cognitive capability to detect and respond to changes in that environment. The system does not require external control for its operation, and this suggests that the system should be capable of learning by itself. Hence, if a certain situation produced favorable results at one point of time and unfavorable results at a later point due to a difference in system reactions, the system should rationally deduce the more appropriate response, if the same situation arises in future. It seems obvious to expect then thatthe system should also have an ability to retain information and results in memory. It is also insufficient for such a system to make informed decisions if it cannot implement them.

2

1.1.3 Artificial Neural Networks

Since we have established what an intelligent system should be capable of performing, it facilitates our quest that we model such systems based on living examples like the human body. Predominantly, the field of artificial intelligence has dealt with mimicking the structure of the human body’s nervous system and genetic engineering. Thus the concepts of evolving artificial neural networks and genetic algorithms arose. We shall look closely into what the former is. “Artificial Neural Networks refer to computing systems whose central theme is borrowed from the analogy of biological neural networks.”

Just as the biological neural network works with nerve cells (neurons), its artificial counterpart works with neurons to cater to specific functions like classifying data or recognizing patterns. These networks can be trained to learn operations by iterations. Thus a training algorithm is developed to ‘teach’ an artificial neural network. The primary reason why neural networks are studied is because they have a remarkable ability to process imprecise data and deliver meaningful results. They can detect patterns or trends that areotherwise unnoticeable by other computer techniques. A neural network that has been trained to process a particular type of data may well be considered an expert in analyzing that data type. Further, it can enable us to speculate how the system would perform in different situations.

3

Chapter 2Neural Networks

2.1 Biological Neural Networks

The nervous system has the prime responsibility of monitoring and maintaining optimal conditions in our internal and external environment. A brief introduction to the nervous system is presented, before its artificial counter-part is examined. The human biological nervous system is divided into the following sub-systems:

2.1.1 Central Nervous System (CNS)

The central nervous system is comprised of the brain and the spinal cord. Much is unknown about the functioning of the human brain. An interesting question that is yet to be answered is how the brain trains itself in accordance with the functioning of an intelligent system. The brain is the control unit of the human body. It receives information from other parts of the body through electrical impulses and sends responsive instructions after processing this information. Different parts of the brain handle various cognitive functions like touch, speech, memory, recognition, hearing etc. The brain is connected through nerve cells to the sensors and actors in the body.

4

The spinal cord runs along the dorsal side in the human body. It connects the brain to other parts of the body. It is enclosed in a vertebrae structure which forms the vertebral column, while the brain itself is enclosed in the human skull. Fluid and tissue act as insulation for the brain and the spinal cord.

2.2 The Neuron

The neuron is the basic functional unit of the nervous system. The human brain alone has over 100 billion neurons. All of them resemble one another in their structure to a considerable extent. Information from neighboring neurons travels through dendrites to the cell body. The cell body contains the nucleus, mitochondria and Golgi bodies. The axon conducts messages away from the cell body. These 3 parts are characteristic of any neuron. Signals sent by the neuron travel across the axon .At the end of every branch lies a structurecalled the synapse. The synapse converts this electrical signal into an activity which inhibits or excites the axon of a neighboring neuron. When a neuron receives excitatory input sufficiently larger than its own inhibitory input, it sends a spike of electrical activity through its axon. Learning occurs when the effectiveness of the synapse changes.

5

According to Zell, “All inputs received by the neuron are summed up before

they are passed onto a threshold function. The output signal is produced by

processing the summed input with the threshold function. The processing time

is about 1ms per cycle and the transmission speed is about 0.6-120 ms.”

Control Flow in a neuron

2.3 Artificial Neurons

The artificial neuron essentially has been modeled on the same lines as our current understanding of the biological neuron with multiple inputs and a single output. This neuron undergoes training and usage phases. During training sessions, the neuron is taught to fire (or not fire) for specific input patterns. Thus when in use, if an input pattern is recognized, its associated output will become the current output. Unrecognized input patterns call for firing rules to determine appropriate action.

6

Structures of the biological neuron (left) and artificial neuron (right)

Chapter 3

ARTIFICIAL NEURAL NETWORK

Artificial neural networks refer to computing systems whose central

theme is borrowed from the analogy of biological neural networks Artificial

neural networks are also referred to as "neural nets," "artificial neural

systems," "parallel distributed processing systems," and "connectionist

systems." The roots of all work on neural networks are in neurobiological

studies that date back to about a century ago. For many decades, biologists

have speculated on exactly how the nervous system works. The following

century-old statement by William James (1890) is particularly insightful,

and is reflected in the subsequent work of many researchers. In a neural

network, each node performs some simple computations, and each

connection conveys a signal from one node to another, labeled by a number

called the "connection strength" or "weight" indicating the extent to which a

signal is amplified or diminished by a connection.

Artificial Neural Networks (ANNs) can solve great variety of problems in

engineering such as systems modeling and control, pattern recognition, image

processing, medical diagnostic. The biologically inspired ANNs are parallel

and distributed information processing systems. These systems require the

massive parallel computation. Thus, the high speed operation in real time

applications can be achieved only if the networks are implemented using

parallel hardware architecture. Most of the work done in this field until now

consists of software simulations, investigating capabilities of ANN models or

new algorithms. But hardware implementations are also essential for

7

applicability and for taking the advantage of neural network’s inherent

parallelism. There are analog, digital and also mixed system architectures

proposed for the implementation of ANNs. The analog ones are more precise

but difficult to implement and have problems with weight storage.

Implementation of ANNs falls into two categories: Software implementation

and hardware implementation. ANNs are implemented, trained and simulated

on sequential computers as software for emulating a wide range of neural

networks models. Software implementations offer flexibility. However

hardware implementations are essential for applicability and taking the

advantage of ANN’s inherent parallelism.

3.1 ADVANTAGES OF ANN

3.1.1 Input-output mapping: The network is presented with an example

and synaptic weights of the network are modified to minimize the difference

between actual response and desired response. The training of the network is

repeated until the network reaches a steady state. Thus the network learns

from the examples by constructing an input–output mapping.

3.1.2 Adaptivity: Neural networks have a built-in capability to adapt their

synaptic weights to changes in surrounding environment. In particular a

neural network train to operate in a specific environment can be easily

retrained to deal with minor changes in the operating environmental

conditions. A neural network can be designed to change its synaptic weights

in real time.

3.1.3 Evidential response: A neural network can be designed to provide

information not only about output but also the confidence in the output. This

8

information may be used to reject ambiguous outputs thereby improve the

classification performance of the network.

3.1.4 Fault tolerance: A neural network, implemented in the hardware

form, has the potential to be inherently fault tolerant. That is its performance

degrades gracefully under adverse operating conditions. Thus, in principle, a

neural network exhibits a graceful degradation in performance rather than

catastrophic failure

3.1.5 Uniformity in analysis and design: Neurons in one form or another,

represents an ingredient common to all neural networks. This commonality

makes it possible to share theories and learning algorithms in different

applications of neural networks. Modular networks can be built through a

seamless integration of modules.

3.1.6 Neurobiological analogy: The design of a neural network is

motivated by analogy with the brain, which is a living proof that fault

tolerant parallel processing is not only physically possible but also fast and

powerful. Engineers look to neurobiology for new ideas to solve problems

more complex than those based on conventional hard-wired design

techniques.

3.1.7 Parallelism: Neural networks has a Parallel and distributed

information processing systems. So Neurons in the same layer can process

information simultaneously. So the neural networks systems are so fast

compared other computational architectures.

9

Chapter 4

4.1 THE MATHEMATICAL MODEL OF NEURON

When creating a functional model of the biological neuron, there are

three basic components of importance. First, the synapses of the neuron are

modeled as weights. The strength of the connection between an input and a

neuron is noted by the value of the weight. Negative weight values reflect

inhibitory connections, while positive values designate excitatory

connections The next two components model the actual activity within the

neuron cell. An adder sums up all the inputs modified by their respective

weights. This activity is referred to as linear combination. Finally, an

activation function controls the amplitude of the output of the neuron. An

acceptable range of output is usually between 0 and 1, or -1 and 1.

From this model the interval activity of the neuron can be shown to be:

10

The output of the neuron, yk, would therefore be the outcome of some

activation function on the value of vk.

4.2 Activation functions

As mentioned previously, the activation function acts as a squashing

function, such that the output of a neuron in a neural network is between

certain values (usually 0 and 1, or -1 and 1). In general, there are three types

of activation functions, denoted by Φ(.) . First, there is the Threshold

Function which takes on a value of 0 if the summed input is less than a certain

threshold value (v), and the value 1 if the summed input is greater than or

equal to the threshold value.

Secondly, there is the Piecewise-Linear function. This function again

can take on the values of 0 or 1, but can also take on values between that

depending on the amplification factor in a certain region of linear operation.

11

Thirdly, there is the sigmoid function. This function can range between

0 and 1, but it is also sometimes useful to use the -1 to 1 range. An example

of the sigmoid function is the hyperbolic tangent function.The artificial neural

networks which we describe are all variations on the parallel distributed

processing (PDP) idea. The architecture of each neural network is based on

very similar building blocks which perform the processing. In this chapter we

first discuss these processing units and discuss different neural network

topologies. Learning strategies as a basis for an adaptive system .

4.3 DIFFERENT ACTIVATION FUNCTIONS

4.3.1 The symmetrical hard limit activation function

The symmetric hard limit transfer function referred to as " hardlims " in

matlab . It is used to classify input into two distinct categories , and can be

defined as follows

12

4.3.2 The saturating linear activation function

The output of saturating linear activation function " satlins ", can be defined as follows

4.3.3 Hyperbolic Tangent Sigmoid activation function

This function takes the input ( which may have any value between plus and minus infinity ) and the output value into the range - 1 to 1 , according to the expression

13

4.4 The back-propagation learning

There are several different methods for setting synaptic weights and

threshold values. The most common is the back propagation of error

algorithm. This is a supervised learning algorithm, which means that you

have to teach the network how to respond on a particular set of input-

patterns. This teaching incorporates the following steps:

1. Present an input-pattern.

2. Read out the produced output-pattern.

3. Compare the produced output-pattern with the desired output-pattern, and

generate

an error signal if there is a difference.

4. This error-signal is fed to the output-neurons, and propagated through the

network in the opposite direction of the feed-forward signals.

5. The weights and thresholds are then changed on basis of these error

signals to reduce the difference between the output and the target.

These steps are either repeated in discrete steps or performed simultaneously

in a true parallel and continuous manner for all input-patterns, until the

network responds correctly . The learning algorithm can be expressed with

two equations. One that measures the error on the output, and one that

expresses the change of a given weight. The error is usually measured as the

difference between the desired output, or target, and the actual output. The

14

weight-change function is then defined to be proportional to the derivative of

the square of the measured error for each output-pattern with respect to each

weight, and with negative constant of proportionality. This will implement a

gradient decent search in the error space for the minimum error.

The computation performed by a neuron in a feed-forward neural network is

almost the same that proposed. The output is an explicit function of the

input, and is described by

where wj i is the weight of input i to neuron j , opi is input i , that is output

i from the previous layer, for input-pattern p, θj is the threshold value and fj

is the activation function for neuron j .

To be more specific, let

be the square of the error measured for input-pattern p, where tpj represents

the target for output neuron j, and opj the actual output. The weight-change

equation is then defined to be

where Δpwji is the change of the weight, and η is a scaling-factor that

15

defines the learning rate of the algorithm. The solution to this differentiation

can be stated in two equations, depending on the weight under consideration.

If the weight belongs to an output neuron the differentiation is

straightforward, and we get

Where is the derivative of the activation function for output-neuron j and

opi is input i to this neuron for pattern p. The δ-term is only the standard

way of expressing the error scaled by the derivative of the output. If the

weight belongs to a hidden neuron, one apply the chain rule

where δpk is the δ for neuron k in the subsequent layer.

16

Chapter 5

FIELD PROGRAMMABLE GATE ARRAY (FPGA)

A field-programmable gate array (FPGA) is an integrated circuit designed to

be configured by the customer or designer after manufacturing—hence "field-

programmable". The FPGA configuration is generally specified using a

hardware description language (HDL), similar to that used for an application-

specific integrated circuit (ASIC) (circuit diagrams were previously used to

specify the configuration, as they were for ASICs, but this is increasingly

rare). FPGAs can be used to implement any logical function that an ASIC

could perform. The ability to update the functionality after shipping, partial

re-configuration of the portion of the design[1] and the low non-recurring

engineering costs relative to an ASIC design (not withstanding the generally

higher unit cost), offer advantages for many applications. FPGAs contain

programmable logic components called "logic blocks", and a hierarchy of

reconfigurable interconnects that allow the blocks to be "wired together"—

somewhat like a one-chip programmable breadboard. Logic blocks can be

configured to perform complex combinational functions, or merely simple

logic gates like AND and XOR. In most FPGAs, the logic blocks also include

memory elements, which may be simple flip-flops or more complete blocks

of memory.

Each FPGA has three main parts. The Configurable Logic Block

(CLB) is the most important part of each FPGA. CLB provide physical

support for the program, which is downloaded on the FPGA. Another part is

17

http://en.wikipedia.org/wiki/Flip-flop_(electronics)

http://en.wikipedia.org/wiki/XOR_gate

http://en.wikipedia.org/wiki/AND_gate

http://en.wikipedia.org/wiki/Logic_gate

http://en.wikipedia.org/wiki/Combinational_logic

http://en.wikipedia.org/wiki/Breadboard

http://en.wikipedia.org/wiki/Programmable_logic_device

http://en.wikipedia.org/wiki/Field-programmable_gate_array#cite_note-0

http://en.wikipedia.org/wiki/Partial_re-configuration

http://en.wikipedia.org/wiki/Partial_re-configuration

http://en.wikipedia.org/wiki/Circuit_diagram

http://en.wikipedia.org/wiki/Application-specific_integrated_circuit

http://en.wikipedia.org/wiki/Application-specific_integrated_circuit

http://en.wikipedia.org/wiki/Hardware_description_language

http://en.wikipedia.org/wiki/Field-programmable

http://en.wikipedia.org/wiki/Field-programmable

http://en.wikipedia.org/wiki/Integrated_circuit

the Input Output Block (IOB), which provides input and output for FPGA

and makes it possible to communicate outside of FPGA. The other part is the

Programmable Interconnect, which connects the different parts of FPGA and

allows them to communicate with each other. These connections are

programmable, so users can define which parts they connect.

FPGAs are chosen for implement ANNs with the following reason:

They can implement a wide range of logic gates starting with tens of

thousands up to few millions gates.

They can be reconfigured to change logic function while resident in

the system.

FPGAs have short design cycle that leads to fairly inexpensive logic

design.

FPGAs have parallelism in their nature. Thus, they have parallel

computing environment and allows logic cycle design to work

parallel.

They have powerful design, programming and syntheses tools.

High clock rate- 100 MHz system clock, 75 MHz SATA clock.

Provision for user-supplied clock.

5.1 ARCHITECTURE

The most common FPGA architecture consists of an array of configurable

logic blocks (CLBs), I/O pads, and routing channels. Generally, all the

routing channels have the same width (number of wires). Multiple I/O pads

may fit into the height of one row or the width of one column in the array.

18

An application circuit must be mapped into an FPGA with adequate

resources. While the number of CLBs and I/Os required is easily determined

from the design, the number of routing tracks needed may vary considerably

even among designs with the same amount of logic. (For example, a crossbar

switch requires much more routing than a systolic array with the same gate

count.) Since unused routing tracks increase the cost (and decrease the

performance) of the part without providing any benefit, FPGA manufacturers

try to provide just enough tracks so that most designs that will fit in terms of

LUTs and IOs can be routed. This is determined by estimates such as those

derived from Rent's rule or by experiments with existing designs.

A classic FPGA logic block consists of a 4-input lookup table (LUT), and a

flip-flop, as shown below. In recent years, manufacturers have started moving

to 6-input LUTs in their high performance parts, claiming increased

performance.

5.2 BASIC PARTS OF AN FPGA

5.2.1 Logic-cells

FPGAs are built from one basic "logic-cell", duplicated hundreds or

thousands of time. A logic cell is basically a small lookup table ("LUT"), a

D-flipflop and a 2-to-1 mux (to bypass the flip-flop if desired).

19

http://en.wikipedia.org/wiki/Flip-flop_(electronics)

http://en.wikipedia.org/wiki/Lookup_table

http://en.wikipedia.org/wiki/Rent's_rule

http://en.wikipedia.org/wiki/Systolic_array

http://en.wikipedia.org/wiki/Crossbar_switch

http://en.wikipedia.org/wiki/Crossbar_switch

The LUT is like a small RAM and has typically 4 inputs, so can

implement any logic gate with up to 4-inputs. For example an AND gate

with 3 inputs, whose result is then OR-ed with another input would fit in

one LUT.

5.2.2 Interconnect

Each logic-cell can be connected to other logic-cells through interconnect

resources (wires/muxes placed around the logic-cells). Each cell can do

little, but with lots of them connected together, complex logic functions can

be created.

20

5.2.3 IO-cells

The interconnect wires also go to the boundary of the device where

I/O cells are implemented and connected to the pins of the FPGAs.

5.2.4 Internal RAM

In addition to logic, all new FPGAs have dedicated blocks of static

RAM distributed among and controlled by the logic elements.

5.3 SPARTEN 3E

5.3.1 DETAILS OF SPARTEN 3E BOARD

The key features of the Spartan-3E Starter Kit board are:

• Xilinx XC3S500E Spartan-3E FPGA

♦ Up to 232 user-I/O pins

♦ 320-pin FBGA package

♦ Over 10,000 logic cells

21

• Xilinx 4 Mbit Platform Flash configuration PROM

• Xilinx 64-macrocell XC2C64A CoolRunner™ CPLD

• 64 MByte (512 Mbit) of DDR SDRAM, x16 data interface, 100+ MHz

• 16 MByte (128 Mbit) of parallel NOR Flash (Intel StrataFlash)

♦ FPGA configuration storage

♦ MicroBlaze code storage/shadowing

• 16 Mbits of SPI serial Flash (STMicro)

♦ FPGA configuration storage

♦ MicroBlaze code shadowing

• 2-line, 16-character LCD screen

• PS/2 mouse or keyboard port

• VGA display port

• 10/100 Ethernet PHY (requires Ethernet MAC in FPGA)

• Two 9-pin RS-232 ports (DTE- and DCE-style)

• On-board USB-based FPGA/CPLD download/debug interface

• 50 MHz clock oscillator

• SHA-1 1-wire serial EEPROM for bitstream copy protection

• Hirose FX2 expansion connector

• Three Digilent 6-pin expansion connectors

• Four-output, SPI-based Digital-to-Analog Converter (DAC)

• Two-input, SPI-based Analog-to-Digital Converter (ADC)

• ChipScope™ SoftTouch debugging port

• Rotary-encoder with push-button shaft

• Eight discrete LEDs

• Four slide switches

• Four push-button switches

22

• SMA clock input

• 8-pin DIP socket for auxiliary clock oscillator

5.3.2 BLOCK DIAGRAM OF SPARTEN BOARD

The XC3S500E Xilinx FPGA device has high gate density i.e. 500,000 logic

gate and many features, as illustrated below, which are necessary for neural

implementation: Fast logic enable the design of compact and fast arithmetic

functions (i.e., multiplication and addition). Look up tables can be used as

RAMs and ROMs. Combinational functions have up to ten inputs within

configurable logic blocks (CLBs), and delays are very small and almost

independent on the number of variable. Very high routing capabilities allow

successful implementation of critical path delays, even for complex neural

network.

23

24

Chapter 6

VHDL and Xilinx ISE Design Suite 10.1

6.1 Introduction to VHDL

The VHSIC Hardware Description Language is an industry standard language used to describe hardware from the abstract to the concrete level. VHDL resulted from work done in the ’70s and early ’80s by the U.S. Department of Defense. Its roots are in the ADA language, as will be seen by the overall structure of VHDL as well as other VHDL statements. VHDL usage has risen rapidly since its inception and is used by literally tens of thousands of engineers around the globe to create sophisticated electronic products. VHDL is a powerful language with numerous language constructs that are capable of describing very complex behavior. In 1986, VHDL was proposed as an IEEE standard. It went through a number of revisions and changes until it was adopted as the IEEE 1076 standard in December 1987.VHDL is also a general-purpose programming language: just as high-level programming languages allow complex design concepts to be expressed as computer programs, VHDL allows the behavior of complex electronic circuits to be captured into a design system for automatic circuit synthesis or for system simulation. Like Pascal, C and C++, VHDL includes features useful for structured design techniques, and offers a rich set of control and data representation features. Unlike these other programming languages, VHDL provides features allowing concurrent events to be described. This is important because the hardware described using VHDL is inherently concurrent in its operation.

6.2 HISTORY OF VHDL

1981 - Initiated by US DoD to address hardware life-cycle crisis1983-85 - Development of baseline language by Intermetrics, IBM and TI1986 - All rights transferred to IEEE1987 - Publication of IEEE Standard

25

1987 - Mil Std 454 requires comprehensive VHDL descriptions to be delivered with Asics 1994 - Revised standard (named VHDL 1076-1993).

6.3 ADVANTAGES OF VHDL

Why choose to use VHDL for your design efforts? There are many likely reasons. If you ask most VHDL tool vendors this question, the first answer you will get is, "It will improve your productivity." But just what does this mean? Can you really expect to get your projects done faster using VHDL than by using your existing design methods? The answer is yes, but probably not the first time you use it, and only if you apply VHDL in a structured man ner. VHDL (like a structured software design language) is most beneficial when you use a structured, top-down approach to design. Real increases in productivity will come later, when you have climbed higher on the VHDL learning curve and have accumulated a library of reusable VHDL components. Productivity increases will also occur when you begin to use VHDL to enhance communication between team members and when you take advantage of the more powerful tools for simulation and design verification that are available. In addition, VHDL allows you to design at a more abstract level. Instead of focusing on a gate-level implementation, you can address the behavioral function of the design.

Describing a design:

In VHDL an entity is used to describe a hardware module.

An entity can be described using,

1. Entity declaration. 2. Architecture.

3. Configuration 4. Package declaration.

5. Package body.

26

6.4 Xilinx ISE Overview

The Integrated Software Environment is the Xilinx design software suite that allows you to take your design from design entry through Xilinx device programming. The ISE Project Navigator manages and processes your design through the following steps in the ISE design flow.

6.4.1 Design Entry

Design entry is the first step in the ISE design flow. During design entry, we create our source files based on our design objectives. We can create our top-level design file using a Hardware Description Language (HDL), such as VHDL, Verilog, or ABEL, or using a schematic. We can use multiple formats for the lower-level source files in our design.

6.4.2 Synthesis

After design entry and optional simulation, we run synthesis. During this step, VHDL, Verilog, or mixed language designs become netlist files that are accepted as input to the implementation step.

6.4.3 Implementation

After synthesis, we run design implementation, which converts the logical design into a physical file format that can be downloaded to the selected target device. From Project Navigator, we can run the implementation process in one step, or we can run each of the implementation processes separately. Implementation processes vary depending on whether you are targeting a Field Programmable Gate Array (FPGA) or a Complex Programmable Logic Device (CPLD).

6.4.4 Verification

We can verify the functionality of our design at several points in the design flow. We can use simulator software to verify the functionality and timing of our design or a portion of our design. The simulator interprets

27

VHDL or Verilog code into circuit functionality and displays logical results of the described HDL to determine correct circuit operation. Simulation allows us to create and verify complex functions in a relatively small amount of time. We can also run in-circuit verification after programming our device.

6.4.5 Device Configuration

After generating a programming file, we configure our device. During configuration, we generate configuration files and download the programming files from a host computer to a Xilinx device.

6.3 FPGA Design Flow Overview

The ISE design flow comprises the following steps: design entry, design synthesis, design implementation, and Xilinx device programming. Design verification, which includes both functional verification and timing verification, takes places at different points during the design flow.

28

6.4 FPGA Programming

Implementing a logic design with an FPGA usually consists of the following steps:

1. Enter a description of logic circuit using a hardware description language (HDL) such as VHDL or Verilog.

2 . Use a logic synthesizer program to transform the HDL or schematic into a netlist. The netlist is just a description of the various logic gates in the design and how they are interconnected.

3. Then use the implementation tools to map the logic gates and interconnections into the FPGA. The FPGA consists of many configurable logic blocks, which can be further decomposed into look-up tables that perform logic operations. The CLBs and LUTs are interwoven with various routing resources. The mapping tool collects the netlist gates into groups that fit into the LUTs and then the place & route tool assigns the groups to specific CLBs while opening or closing the switches in the routing matrices to connect them together.

4. Once the implementation phase is complete, a program extracts the state of the switches in the routing matrices and generates a bit stream where the ones and zeroes correspond to open or closed switches.

5. The bit stream is downloaded into a physical FPGA chip. The electronic switches in the FPGA open or close in response to the binary bits in the bit stream. Upon completion of the downloading, the FPGA will perform the operations specified by your HDL code or schematic.

XILINX ISE provides the HDL and schematic editors, logic synthesizer, fitter, and bitstream generator software.

29

6.4.1 Diagrammatic representation of FPGA programming

6.5 iMPACT Overview

iMPACT, a tool featuring batch and GUI operations, allows to perform the following functions: Device Configuration and File Generation. The Device Configuration enables you to directly configure Xilinx FPGAs or program Xilinx CPLDs and PROMs with the Xilinx cables in various modes. In the Boundary-Scan mode, Xilinx FPGAs, CPLDs, and PROMs can be

30

configured or programmed. In the Slave Serial or Select MAP configuration modes only FPGAs can be configured directly. In the Desktop Configuration mode Xilinx CPLDs or PROMs can be programmed. In the Direct SPI Configuration mode select SPI serial flash can be programmed.

File Generation enables you to create the following types of programming files; System ACE CF, PROM, SVF, STAPL, and XSVF files.

iMPACT also enables you to do the following:

1.Readback and verify design configuration data.

2. Debug configuration problems.

3. Execute SVF and XSVF files.

6.6 ISE Simulator (ISim) Overview

Use the Xilinx® ISE Simulator (ISim) to pecrform the following two functions on your design:

Create a graphical VHDL test bench or Verilog test fixture .

Perform functional and timing simulations for VHDL, Verilog and mixed VHDL/Verilog designs using a Hardware Description Language (HDL) simulator.

31

Chapter 7

Floating point unit

The floating point unit (FPU) implemented during this project, is a 32-bit processing unit which allows arithmetic operations on floating point numbers. The FPU complies fully with the IEEE 754 Standard. The FPU was written in VHDL with top priority to be able to run at approximately 100-MHz and at the same time as small as possible. Meeting both goals at the same time was very difficult and tradeoffs were made.

7.1Floating point numbers

The floating-point representation is one way to represent real numbers. A floating-point number n is represented with an exponent e and a

mantissa m, so that: n = be × m, where b is the base number (also called radix) So for example, if we choose the number n=17 and the base b=10,

the floating-point representation of 17 would be: 17 = 101 x 1.7

Another way to represent real numbers is to use fixed-point number representation. A fixed-point number with 4 digits after the decimal point could be used to represent numbers such as: 1.0001, 12.1019, 34.0000, etc. Both representations are used depending on the situation. For the implementation on hardware, the base-2 exponents are used, since digital systems work with binary numbers. Using base-2 arithmetic brings problems with it, so for example fractional powers of 10 like 0.1 or 0.01 cannot exactly be represented with the floating-point format, while with fixed-point format, the decimal point can be thought away (provided the value is within the range) giving an exact representation. Fixed-point arithmetic, which is faster than floating-point arithmetic, can then be used. This is one of the reasons why fixed-point representations are used for financial and commercial applications.

The floating-point format can represent a wide range of scale without losing precision, while the fixed-point format has a fixed window of

32

representation. So for example in a 32-bit floating-point representation,

numbers from 3.4 x 1038 to 1.4 x 10-45 can be represented with ease, which is one of the reasons why floating-point representation is the most common solution. Floating-point representations also include special values like infinity, Not-a-Number.

7.2 IEEE Standard 754 for Binary Floating-Point Arithmetic The IEEE (Institute of Electrical and Electronics Engineers) has

produced a Standard to define floating-point representation and arithmetic. Although there are other representations, it is the most common representation used for floating point numbers. The standard brought out by the IEEE come to be known as IEEE 754. When it comes to their precision and width in bits, the standard defines two groups: basic- and extended format. The extended format is implementation dependent and doesn’t concern this project. The basic format is further divided into single-precision format with 32-bits wide, and double-precision format with 64-bits wide. The three basic components are the sign, exponent, and mantissa. The storage layout for single-precision is show below:

The number represented by the single-precision format is:

value = (-1)s2e × 1.f (normalized) when E > 0 else

= (-1)s2-126 × 0.f (denormalized)where

f = (b23-1+b22

-2+ bin +…+b0

-23) where bin =1 or 0

s = sign (0 is positive; 1 is negative)E =biased exponent; Emax=255 , Emin=0. E=255 and E=0 are used to

represent special values.e =unbiased exponent; e = E – 127(bias)

33

A bias of 127 is added to the actual exponent to make negative exponents possible without using a sign bit. So for example if the value 100 is stored in the exponent placeholder, the exponent is actually -27 (100 – 127). Not the whole range of E is used to represent numbers. As you may have seen from the above formula, the leading fraction bit before the decimal point is actually implicit (not given) and can be 1 or 0 depending on the exponent and therefore saving one bit.

7.3 Arithmetic on floating point numbers

7.3.1 Addition algorithm

Addition and Subtraction operations on floating-point numbers are a lot more complex than that on integers. The basic algorithm for adding or subtracting FP numbers is shown in the following flow diagram.

34

example on fl pt. value given in binary:

.25 = 0 01111101 00000000000000000000000 100 = 0 10000101 10010000000000000000000

to add these fl. pt. representations, step 1: align radix points

shifting the mantissa LEFT by 1 bit DECREASES THE EXPONENT by 1

shifting the mantissa RIGHT by 1 bit INCREASES THE EXPONENT by 1

we want to shift the mantissa right, because the bits that fall off the end should come from the least significant end of the mantissa

-> choose to shift the .25, since we want to increase it's exponent. -> shift by 10000101

-01111101 --------- 00001000 (8) places.

0 01111101 00000000000000000000000 (original value) 0 01111110 10000000000000000000000 (shifted 1 place) 0 01111111 01000000000000000000000 (shifted 2 places) 0 10000000 00100000000000000000000 (shifted 3 places) 0 10000001 00010000000000000000000 (shifted 4 places) 0 10000010 00001000000000000000000 (shifted 5 places) 0 10000011 00000100000000000000000 (shifted 6 places) 0 10000100 00000010000000000000000 (shifted 7 places) 0 10000101 00000001000000000000000 (shifted 8 places) step 2: add (don't forget the hidden bit for the 100) 0 10000101 1.10010000000000000000000 (100) + 0 10000101 0.00000001000000000000000 (.25) --------------------------------------- 0 10000101 1.10010001000000000000000 step 3: normalize the result (get the "hidden bit" to be a 1) it already is for this example. The result is 0 10000101 10010001000000000000000

35

7.3.2 Multiplication algorithm

36

Chapter 8

Hardware Implementation of Neural Networks on Field programmable Gate Array (FPGA)

For the implementation, VHDL language was used . VHDL (Very high speed integrated circuit Hardware Description Language) is a hardware description language which simplifies the development of complex systems because it is possible to model and simulate a digital system form a high level of abstraction and with important facilities for modular design. In the present work, we introduced the design of an artificial neuron models based on a Xilinx FPGA device. The Xilinx FPGA device has high gate density i.e. 500,000 logic gate and many features, as illustrated below which are necessary for neural implementation:

- Fast logic enable the design of compact and fast arithmetic functions (i.e., multiplication and addition ).

- Look up tables can be used as RAMs and ROMs.

- Combinational functions have up to ten inputs within configurable logic blocks (CLBs),

and delays are very small and almost independent on the number of variable.

- Very high routing capabilities allows successful implementation of critical path delays,even for complex neural network. The different module in the implementation of artificial neuron are floating point adder , floating point multiplier and a sigmoid function generator. There are so many methods are available for the implementation of sigmoid function.

8.1 Different methods to implement Activation Function

37

8.1.1 Look Up Table:

Sigmoid function is expensive to realize on a platform that lacks a

floating-point unit. To reduce the realization costs, the sigmoid transfer

function is replaced by a computationally less expensive fixed-step lookup

table and a linear interpolation is performed to replace the sigmoid function.

An LUT can be used to implement the sigmoid activation function by means

of discrete value. But it consumes a large storage area when moderately high

precession. If input range needs 21 bits and a precision need 16 bits to

represent the results of LUT, then 221 _ 16= 4MB LUT is needed. It is

consuming a large area and time, which may affect the speed of

computation. If Taylor series is chosen to represent the exponential function,

the computation involves more multipliers and thus increases area. The LUT

design does not optimize well under FLP format and hence fixed-point

format is used. But still on-chip realization of log-sigmoid function increases

the size of hardware considerably. To optimize the area to some extent, the

inbuilt RAM available in FPGAs is used to realize LUT based activation

function. It reduces area and improves speed. If the precision is to be

improved, then hardware friendly PWL approximation approach of

activation function is to be utilized.

8.1.2 Linear approximation:

This approximation is simple to implement and is given by:

38

where h is an adjustable parameter. This is a linear function in the domain

[−h , h] with a slope of 1/2h.

8.1.3 Polynomial approximation:

This method consists of using a polynomial P, of order N, which

provides the appropriate approximation of the sigmoid function on an

interval I= [a,b]. There exist various methods to implement this

approximation such as one based on Taylor’s series centered around the

centre of the interval. However, this gives a better approximation only

around the point considered and not on the complete interval. A five order

polynomial approximation of the sigmoid function in the domain I= [0, 5] is

given by:

P(x) = α + (β + (γ + (δ + (ε +θ. x).x).x).x).x with:

α = 0.49999351; β = 0.25276133

γ = - 0.00468879; δ = - 0.02246096

ε = 0.00541780

θ = - 0.00039438

This yields a relative error 0.689×10-3.

39

8.1.4 2nd Order Approximation. Consider the second order equation: y = ax2+ bx + cUsing a least-square approximation for x Є [0; 4] gives

y = -0.0363x2 + 0.2610x+ 0.5028

This function can be further simpli_ed to

y = 0.972 -0.57(0.25x- 0.898)2

By inspection of the original sigmoid function, one finds that it is an antisymmetric function around y = 0.5. In addition, the sigmoid function has two asymptotes at y =0 and y = 1. Hence the function evaluation for x < 0is the complement of it at x > 0. The approximation function can be expressed as

This is the method that we have used in our project.The above expression can be easly implemented by vdhl using the floating point addder and multiplier.

Chapter 9

40

9.1 Vhdl code for Floating point adder

The full code for the adder is divided into different entity and all are combined by port mapping. The different unit in the adder are FPadd_normalizeFPalignFPinvertFPnormalizeFProundFPselComplementFPswapPackFPUnpackFP

9.1.1 Main code for adderLIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;

ENTITY FPadd IS PORT( ADD_SUB : IN std_logic; FP_A : IN std_logic_vector (31 DOWNTO 0); FP_B : IN std_logic_vector (31 DOWNTO 0); clk : IN std_logic; FP_Z : OUT std_logic_vector (31 DOWNTO 0) );END FPadd ;ARCHITECTURE single_cycle OF FPadd IS SIGNAL A_CS : std_logic_vector(28 DOWNTO 0); SIGNAL A_EXP : std_logic_vector(7 DOWNTO 0); SIGNAL A_SIG : std_logic_vector(31 DOWNTO 0); SIGNAL A_SIGN : std_logic; SIGNAL A_in : std_logic_vector(28 DOWNTO 0); SIGNAL A_isDN : std_logic; SIGNAL A_isINF : std_logic; SIGNAL A_isNaN : std_logic; SIGNAL A_isZ : std_logic; SIGNAL B_CS : std_logic_vector(28 DOWNTO 0); SIGNAL B_EXP : std_logic_vector(7 DOWNTO 0); SIGNAL B_SIG : std_logic_vector(31 DOWNTO 0); SIGNAL B_SIGN : std_logic; SIGNAL B_XSIGN : std_logic; SIGNAL B_in : std_logic_vector(28 DOWNTO 0); SIGNAL B_isDN : std_logic;

41

SIGNAL B_isINF : std_logic; SIGNAL B_isNaN : std_logic; SIGNAL B_isZ : std_logic; SIGNAL EXP_base : std_logic_vector(7 DOWNTO 0); SIGNAL EXP_diff : std_logic_vector(8 DOWNTO 0); SIGNAL EXP_isINF : std_logic; SIGNAL EXP_norm : std_logic_vector(7 DOWNTO 0); SIGNAL EXP_round : std_logic_vector(7 DOWNTO 0); SIGNAL EXP_selC : std_logic_vector(7 DOWNTO 0); SIGNAL OV : std_logic; SIGNAL SIG_norm : std_logic_vector(27 DOWNTO 0); SIGNAL SIG_norm2 : std_logic_vector(27 DOWNTO 0); SIGNAL SIG_round : std_logic_vector(27 DOWNTO 0); SIGNAL SIG_selC : std_logic_vector(27 DOWNTO 0); SIGNAL Z_EXP : std_logic_vector(7 DOWNTO 0); SIGNAL Z_SIG : std_logic_vector(22 DOWNTO 0); SIGNAL Z_SIGN : std_logic; SIGNAL a_align : std_logic_vector(28 DOWNTO 0); SIGNAL a_exp_in : std_logic_vector(8 DOWNTO 0); SIGNAL a_inv : std_logic_vector(28 DOWNTO 0); SIGNAL add_out : std_logic_vector(28 DOWNTO 0); SIGNAL b_align : std_logic_vector(28 DOWNTO 0); SIGNAL b_exp_in : std_logic_vector(8 DOWNTO 0); SIGNAL b_inv : std_logic_vector(28 DOWNTO 0); SIGNAL cin : std_logic; SIGNAL cin_sub : std_logic; SIGNAL invert_A : std_logic; SIGNAL invert_B : std_logic; SIGNAL isINF : std_logic; SIGNAL isINF_tab : std_logic; SIGNAL isNaN : std_logic; SIGNAL isZ : std_logic; SIGNAL isZ_tab : std_logic; SIGNAL mux_sel : std_logic; SIGNAL zero : std_logic;

SIGNAL mw_I13din0 : std_logic_vector(7 DOWNTO 0); SIGNAL mw_I13din1 : std_logic_vector(7 DOWNTO 0);

COMPONENT FPadd_normalize PORT ( EXP_in : IN std_logic_vector (7 DOWNTO 0); SIG_in : IN std_logic_vector (27 DOWNTO 0); EXP_out : OUT std_logic_vector (7 DOWNTO 0); SIG_out : OUT std_logic_vector (27 DOWNTO 0); zero : OUT std_logic ); END COMPONENT; COMPONENT FPalign

42

PORT ( A_in : IN std_logic_vector (28 DOWNTO 0); B_in : IN std_logic_vector (28 DOWNTO 0); cin : IN std_logic ; diff : IN std_logic_vector (8 DOWNTO 0); A_out : OUT std_logic_vector (28 DOWNTO 0); B_out : OUT std_logic_vector (28 DOWNTO 0) ); END COMPONENT; COMPONENT FPinvert GENERIC ( width : integer := 29 ); PORT ( A_in : IN std_logic_vector (width-1 DOWNTO 0); B_in : IN std_logic_vector (width-1 DOWNTO 0); invert_A : IN std_logic ; invert_B : IN std_logic ; A_out : OUT std_logic_vector (width-1 DOWNTO 0); B_out : OUT std_logic_vector (width-1 DOWNTO 0) ); END COMPONENT; COMPONENT FPnormalize GENERIC ( SIG_width : integer := 28 ); PORT ( SIG_in : IN std_logic_vector (SIG_width-1 DOWNTO 0); EXP_in : IN std_logic_vector (7 DOWNTO 0); SIG_out : OUT std_logic_vector (SIG_width-1 DOWNTO 0); EXP_out : OUT std_logic_vector (7 DOWNTO 0) ); END COMPONENT; COMPONENT FPround GENERIC ( SIG_width : integer := 28 ); PORT ( SIG_in : IN std_logic_vector (SIG_width-1 DOWNTO 0); EXP_in : IN std_logic_vector (7 DOWNTO 0); SIG_out : OUT std_logic_vector (SIG_width-1 DOWNTO 0); EXP_out : OUT std_logic_vector (7 DOWNTO 0) ); END COMPONENT; COMPONENT FPselComplement GENERIC ( SIG_width : integer := 28 ); PORT ( SIG_in : IN std_logic_vector (SIG_width DOWNTO 0);

43

EXP_in : IN std_logic_vector (7 DOWNTO 0); SIG_out : OUT std_logic_vector (SIG_width-1 DOWNTO 0); EXP_out : OUT std_logic_vector (7 DOWNTO 0) ); END COMPONENT; COMPONENT FPswap GENERIC ( width : integer := 29 ); PORT ( A_in : IN std_logic_vector (width-1 DOWNTO 0); B_in : IN std_logic_vector (width-1 DOWNTO 0); swap_AB : IN std_logic ; A_out : OUT std_logic_vector (width-1 DOWNTO 0); B_out : OUT std_logic_vector (width-1 DOWNTO 0) ); END COMPONENT; COMPONENT PackFP PORT ( SIGN : IN std_logic ; EXP : IN std_logic_vector (7 DOWNTO 0); SIG : IN std_logic_vector (22 DOWNTO 0); isNaN : IN std_logic ; isINF : IN std_logic ; isZ : IN std_logic ; FP : OUT std_logic_vector (31 DOWNTO 0) ); END COMPONENT; COMPONENT UnpackFP PORT ( FP : IN std_logic_vector (31 DOWNTO 0); SIG : OUT std_logic_vector (31 DOWNTO 0); EXP : OUT std_logic_vector (7 DOWNTO 0); SIGN : OUT std_logic ; isNaN : OUT std_logic ; isINF : OUT std_logic ; isZ : OUT std_logic ; isDN : OUT std_logic ); END COMPONENT;

FOR ALL : FPadd_normalize USE ENTITY work.FPadd_normalize; FOR ALL : FPalign USE ENTITY work.FPalign; FOR ALL : FPinvert USE ENTITY work.FPinvert; FOR ALL : FPnormalize USE ENTITY work.FPnormalize; FOR ALL : FPround USE ENTITY work.FPround; FOR ALL : FPselComplement USE ENTITY work.FPselComplement; FOR ALL : FPswap USE ENTITY work.FPswap; FOR ALL : PackFP USE ENTITY work.PackFP;

44

FOR ALL : UnpackFP USE ENTITY work.UnpackFP;BEGIN cin_sub <= (A_isDN OR A_isZ) XOR (B_isDN OR B_isZ); Z_SIG <= SIG_norm2(25 DOWNTO 3); eb3_truth_process: PROCESS(ADD_SUB, A_isINF, A_isNaN, A_isZ, B_isINF, B_isNaN, B_isZ) BEGIN IF (A_isNaN = '1') THEN isINF_tab <= '0'; isNaN <= '1'; isZ_tab <= '0'; ELSIF (B_isNaN = '1') THEN isINF_tab <= '0'; isNaN <= '1'; isZ_tab <= '0'; ELSIF (ADD_SUB = '1') AND (A_isINF = '1') AND (B_isINF = '1') THEN isINF_tab <= '1'; isNaN <= '0'; isZ_tab <= '0'; ELSIF (ADD_SUB = '0') AND (A_isINF = '1') AND (B_isINF = '1') THEN isINF_tab <= '0'; isNaN <= '1'; isZ_tab <= '0'; ELSIF (A_isINF = '1') THEN isINF_tab <= '1'; isNaN <= '0'; isZ_tab <= '0'; ELSIF (B_isINF = '1') THEN isINF_tab <= '1'; isNaN <= '0'; isZ_tab <= '0'; ELSIF (A_isZ = '1') AND (B_isZ = '1') THEN isINF_tab <= '0'; isNaN <= '0'; isZ_tab <= '1'; ELSE isINF_tab <= '0'; isNaN <= '0'; isZ_tab <= '0'; END IF; END PROCESS eb3_truth_process; mux_sel <= EXP_diff(8); InvertLogic_truth_process: PROCESS(A_SIGN, B_XSIGN, EXP_diff) BEGIN IF (A_SIGN = '0') AND (B_XSIGN = '0') THEN invert_A <= '0'; invert_B <= '0'; ELSIF (A_SIGN = '1') AND (B_XSIGN = '1') THEN invert_A <= '0';

45

invert_B <= '0'; ELSIF (A_SIGN = '0') AND (B_XSIGN = '1') AND (EXP_diff(8) = '0') THEN invert_A <= '0'; invert_B <= '1'; ELSIF (A_SIGN = '0') AND (B_XSIGN = '1') AND (EXP_diff(8) = '1') THEN invert_A <= '1'; invert_B <= '0'; ELSIF (A_SIGN = '1') AND (B_XSIGN = '0') AND (EXP_diff(8) = '0') THEN invert_A <= '1'; invert_B <= '0'; ELSIF (A_SIGN = '1') AND (B_XSIGN = '0') AND (EXP_diff(8) = '1') THEN invert_A <= '0'; invert_B <= '1'; ELSE invert_A <= '0'; invert_B <= '0'; END IF; END PROCESS InvertLogic_truth_process; SignLogic_truth_process: PROCESS(A_SIGN, B_XSIGN, add_out) VARIABLE b1_A_SIGNB_XSIGNadd_out_28 : std_logic_vector(2 DOWNTO 0); BEGIN b1_A_SIGNB_XSIGNadd_out_28 := A_SIGN & B_XSIGN & add_out(28); CASE b1_A_SIGNB_XSIGNadd_out_28 IS WHEN "000" => OV <= '0'; Z_SIGN <= '0'; WHEN "001" => OV <= '1'; Z_SIGN <= '0'; WHEN "010" => OV <= '0'; Z_SIGN <= '0'; WHEN "011" => OV <= '0'; Z_SIGN <= '1'; WHEN "100" => OV <= '0'; Z_SIGN <= '0'; WHEN "101" => OV <= '0'; Z_SIGN <= '1'; WHEN "110" => OV <= '0'; Z_SIGN <= '1'; WHEN "111" => OV <= '1'; Z_SIGN <= '1'; WHEN OTHERS => OV <= '0'; Z_SIGN <= '0';

46

END CASE; END PROCESS SignLogic_truth_process; A_in <= "00" & A_SIG(23 DOWNTO 0) & "000"; B_in <= "00" & B_SIG(23 DOWNTO 0) & "000"; EXP_isINF <= '1' WHEN (OV='1' OR Z_EXP=X"FF") ELSE '0'; a_exp_in <= "0" & A_EXP; b_exp_in <= "0" & B_EXP; I4combo: PROCESS (a_inv, b_inv, cin) VARIABLE mw_I4t0 : std_logic_vector(29 DOWNTO 0); VARIABLE mw_I4t1 : std_logic_vector(29 DOWNTO 0); VARIABLE mw_I4sum : signed(29 DOWNTO 0); VARIABLE mw_I4carry : std_logic; BEGIN mw_I4t0 := a_inv(28) & a_inv; mw_I4t1 := b_inv(28) & b_inv; mw_I4carry := cin; mw_I4sum := signed(mw_I4t0) + signed(mw_I4t1) + mw_I4carry; add_out <= conv_std_logic_vector(mw_I4sum(28 DOWNTO 0),29); END PROCESS I4combo; I13combo: PROCESS(mw_I13din0, mw_I13din1, mux_sel) VARIABLE dtemp : std_logic_vector(7 DOWNTO 0); BEGIN CASE mux_sel IS WHEN '0'|'L' => dtemp := mw_I13din0; WHEN '1'|'H' => dtemp := mw_I13din1; WHEN OTHERS => dtemp := (OTHERS => 'X'); END CASE; EXP_base <= dtemp; END PROCESS I13combo; mw_I13din0 <= A_EXP; mw_I13din1 <= B_EXP; isINF <= EXP_isINF OR isINF_tab; cin <= invert_B OR invert_A; isZ <= zero OR isZ_tab; I3combo: PROCESS (a_exp_in, b_exp_in, cin_sub) VARIABLE mw_I3t0 : std_logic_vector(9 DOWNTO 0); VARIABLE mw_I3t1 : std_logic_vector(9 DOWNTO 0); VARIABLE diff : signed(9 DOWNTO 0); VARIABLE borrow : std_logic; BEGIN mw_I3t0 := a_exp_in(8) & a_exp_in; mw_I3t1 := b_exp_in(8) & b_exp_in; borrow := cin_sub; diff := signed(mw_I3t0) - signed(mw_I3t1) - borrow; EXP_diff <= conv_std_logic_vector(diff(8 DOWNTO 0),9); END PROCESS I3combo; B_XSIGN <= NOT(B_SIGN XOR ADD_SUB); I8 : FPadd_normalize PORT MAP ( EXP_in => EXP_selC,

47

SIG_in => SIG_selC, EXP_out => EXP_norm, SIG_out => SIG_norm, zero => zero ); I6 : FPalign PORT MAP ( A_in => A_CS, B_in => B_CS, cin => cin_sub, diff => EXP_diff, A_out => a_align, B_out => b_align ); I14 : FPinvert GENERIC MAP ( width => 29 ) PORT MAP ( A_in => a_align, B_in => b_align, invert_A => invert_A, invert_B => invert_B, A_out => a_inv, B_out => b_inv ); I11 : FPnormalize GENERIC MAP ( SIG_width => 28 ) PORT MAP ( SIG_in => SIG_round, EXP_in => EXP_round, SIG_out => SIG_norm2, EXP_out => Z_EXP ); I10 : FPround GENERIC MAP ( SIG_width => 28 ) PORT MAP ( SIG_in => SIG_norm, EXP_in => EXP_norm, SIG_out => SIG_round, EXP_out => EXP_round ); I12 : FPselComplement GENERIC MAP ( SIG_width => 28 ) PORT MAP ( SIG_in => add_out, EXP_in => EXP_base, SIG_out => SIG_selC, EXP_out => EXP_selC ); I5 : FPswap GENERIC MAP ( width => 29 ) PORT MAP (

48

A_in => A_in, B_in => B_in, swap_AB => EXP_diff(8), A_out => A_CS, B_out => B_CS ); I2 : PackFP PORT MAP ( SIGN => Z_SIGN, EXP => Z_EXP, SIG => Z_SIG, isNaN => isNaN, isINF => isINF, isZ => isZ, FP => FP_Z ); I0 : UnpackFP PORT MAP ( FP => FP_A, SIG => A_SIG, EXP => A_EXP, SIGN => A_SIGN, isNaN => A_isNaN, isINF => A_isINF, isZ => A_isZ, isDN => A_isDN ); I1 : UnpackFP PORT MAP ( FP => FP_B, SIG => B_SIG, EXP => B_EXP, SIGN => B_SIGN, isNaN => B_isNaN, isINF => B_isINF, isZ => B_isZ, isDN => B_isDN );END single_cycle;

9.1.2 Code for fpaddnormalize

LIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;ENTITY FPadd_normalize IS PORT( EXP_in : IN std_logic_vector (7 DOWNTO 0); SIG_in : IN std_logic_vector (27 DOWNTO 0); EXP_out : OUT std_logic_vector (7 DOWNTO 0); SIG_out : OUT std_logic_vector (27 DOWNTO 0); zero : OUT std_logic

49

);END FPadd_normalize ;ARCHITECTURE struct OF FPadd_normalize IS SIGNAL EXP_lshift : std_logic_vector(7 DOWNTO 0); SIGNAL EXP_rshift : std_logic_vector(7 DOWNTO 0); SIGNAL SIG_lshift : std_logic_vector(27 DOWNTO 0); SIGNAL SIG_rshift : std_logic_vector(27 DOWNTO 0); SIGNAL add_in : std_logic_vector(7 DOWNTO 0); SIGNAL cin : std_logic; SIGNAL count : std_logic_vector(4 DOWNTO 0); SIGNAL isDN : std_logic; SIGNAL shift_RL : std_logic; SIGNAL word : std_logic_vector(26 DOWNTO 0); SIGNAL zero_int : std_logic; -- Component Declarations COMPONENT FPlzc PORT ( word : IN std_logic_vector (26 DOWNTO 0); zero : OUT std_logic ; count : OUT std_logic_vector (4 DOWNTO 0) ); END COMPONENT; FOR ALL : FPlzc USE ENTITY work.FPlzc;BEGIN SIG_rshift <= '0' & SIG_in(27 DOWNTO 2) & (SIG_in(1) AND SIG_in(0)); add_in <= "000" & count; PROCESS( isDN, shift_RL, EXP_lshift, EXP_rshift, EXP_in, SIG_lshift, SIG_rshift, SIG_in) BEGIN IF (isDN='1') THEN EXP_out <= X"00"; SIG_out <= SIG_in; ELSE IF (shift_RL='1') THEN IF (SIG_in(27)='1') THEN EXP_out <= EXP_rshift; SIG_out <= SIG_rshift; ELSE EXP_out <= EXP_in; SIG_out <= SIG_in; END IF; ELSE -- Shift Left EXP_out <= EXP_lshift; SIG_out <= SIG_lshift; END IF; END IF; END PROCESS; zero <= zero_int AND NOT SIG_in(27); word <= SIG_in(26 DOWNTO 0); PROCESS(SIG_in,EXP_in) BEGIN

50

IF (SIG_in(27)='0' AND SIG_in(26)='0' AND (EXP_in=X"01")) THEN isDN <= '1'; shift_RL <= '0'; ELSIF (SIG_in(27)='0' AND SIG_in(26)='0' AND (EXP_in/=X"00")) THEN isDN <= '0'; shift_RL <= '0'; ELSE isDN <= '0'; shift_RL <= '1'; END IF; END PROCESS; cin <= '0'; I4combo: PROCESS (EXP_in) VARIABLE t0 : std_logic_vector(8 DOWNTO 0); VARIABLE sum : signed(8 DOWNTO 0); VARIABLE din_l : std_logic_vector(7 DOWNTO 0); BEGIN din_l := EXP_in; t0 := din_l(7) & din_l; sum := (signed(t0) + '1'); EXP_rshift <= conv_std_logic_vector(sum(7 DOWNTO 0),8); END PROCESS I4combo; I1combo : PROCESS (SIG_in, count) VARIABLE stemp : std_logic_vector (4 DOWNTO 0); VARIABLE dtemp : std_logic_vector (27 DOWNTO 0); VARIABLE temp : std_logic_vector (27 DOWNTO 0); BEGIN temp := (OTHERS=> 'X'); stemp := count; temp := SIG_in; FOR i IN 4 DOWNTO 0 LOOP IF (i < 5) THEN IF (stemp(i) = '1' OR stemp(i) = 'H') THEN dtemp := (OTHERS => '0'); dtemp(27 DOWNTO 2**i) := temp(27 - 2**i DOWNTO 0); ELSIF (stemp(i) = '0' OR stemp(i) = 'L') THEN dtemp := temp; ELSE dtemp := (OTHERS => 'X'); END IF; ELSE IF (stemp(i) = '1' OR stemp(i) = 'H') THEN dtemp := (OTHERS => '0'); ELSIF (stemp(i) = '0' OR stemp(i) = 'L') THEN dtemp := temp; ELSE dtemp := (OTHERS => 'X'); END IF; END IF; temp := dtemp;

51

END LOOP; SIG_lshift <= dtemp; END PROCESS I1combo; I2combo: PROCESS (EXP_in, add_in, cin) VARIABLE mw_I2t0 : std_logic_vector(8 DOWNTO 0); VARIABLE mw_I2t1 : std_logic_vector(8 DOWNTO 0); VARIABLE diff : signed(8 DOWNTO 0); VARIABLE borrow : std_logic; BEGIN mw_I2t0 := EXP_in(7) & EXP_in; mw_I2t1 := add_in(7) & add_in; borrow := cin; diff := signed(mw_I2t0) - signed(mw_I2t1) - borrow; EXP_lshift <= conv_std_logic_vector(diff(7 DOWNTO 0),8); END PROCESS I2combo; I0 : FPlzc PORT MAP ( word => word, zero => zero_int, count => count );END struct;

9.1.3 Code for FPlzcLIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;

ENTITY FPlzc IS PORT( word : IN std_logic_vector (26 DOWNTO 0); zero : OUT std_logic; count : OUT std_logic_vector (4 DOWNTO 0) );END FPlzc ;ARCHITECTURE FPlzc OF FPlzc ISBEGINPROCESS(word)BEGIN

zero <= '0';IF (word(26 DOWNTO 0)="000000000000000000000000000") THEN count <= "11011";zero <= '1';ELSIF (word(26 DOWNTO 1)="00000000000000000000000000")

THEN count <= "11010";ELSIF (word(26 DOWNTO 2)="0000000000000000000000000") THEN

count <= "11001";ELSIF (word(26 DOWNTO 3)="000000000000000000000000") THEN

count <= "11000";ELSIF (word(26 DOWNTO 4)="00000000000000000000000") THEN

count <= "10111";

52

ELSIF (word(26 DOWNTO 5)="0000000000000000000000") THEN count <= "10110";














ELSIF (word(26 DOWNTO 19)="00000000") THEN count <= "01000";ELSIF (word(26 DOWNTO 20)="0000000") THEN count <= "00111";ELSIF (word(26 DOWNTO 21)="000000") THEN count <= "00110";ELSIF (word(26 DOWNTO 22)="00000") THEN count <= "00101";ELSIF (word(26 DOWNTO 23)="0000") THEN count <= "00100";ELSIF (word(26 DOWNTO 24)="000") THEN count <= "00011";ELSIF (word(26 DOWNTO 25)="00") THEN count <= "00010";ELSIF (word(26)='0') THEN count <= "00001";ELSEcount <= "00000";END IF;

END PROCESS;END FPlzc;9.1.4 FPalign

LIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;

ENTITY FPalign IS PORT( A_in : IN std_logic_vector (28 DOWNTO 0);

53

B_in : IN std_logic_vector (28 DOWNTO 0); cin : IN std_logic; diff : IN std_logic_vector (8 DOWNTO 0); A_out : OUT std_logic_vector (28 DOWNTO 0); B_out : OUT std_logic_vector (28 DOWNTO 0) );END FPalign ;

ARCHITECTURE struct OF FPalign IS SIGNAL B_shift : std_logic_vector(28 DOWNTO 0); SIGNAL diff_int : std_logic_vector(8 DOWNTO 0); SIGNAL shift_B : std_logic_vector(5 DOWNTO 0);BEGIN PROCESS(diff_int, B_shift) BEGIN IF (diff_int(8)='1') THEN IF (((NOT diff_int) + 1) > 28) THEN B_out <= (OTHERS => '0'); ELSE B_out <= B_shift; END IF; ELSE IF (diff_int > 28) THEN B_out <= (OTHERS => '0'); ELSE B_out <= B_shift; END IF; END IF; END PROCESS; PROCESS(diff_int) BEGIN IF (diff_int(8)='1') THEN shift_B <= (NOT diff_int(5 DOWNTO 0)) + 1; ELSE shift_B <= diff_int(5 DOWNTO 0) ; END IF; END PROCESS; PROCESS(cin,diff) BEGIN IF ((cin='1') AND (diff(8)='1')) THEN diff_int <= diff + 2; ELSE diff_int <= diff; END IF; END PROCESS; A_out <= A_in; I1combo : PROCESS (B_in, shift_B) VARIABLE stemp : std_logic_vector (5 DOWNTO 0);

54

VARIABLE dtemp : std_logic_vector (28 DOWNTO 0); VARIABLE temp : std_logic_vector (28 DOWNTO 0); BEGIN temp := (OTHERS=> 'X'); stemp := shift_B; temp := B_in; FOR i IN 5 DOWNTO 0 LOOP IF (i < 5) THEN IF (stemp(i) = '1' OR stemp(i) = 'H') THEN dtemp := (OTHERS => '0'); dtemp(28 - 2**i DOWNTO 0) := temp(28 DOWNTO 2**i); ELSIF (stemp(i) = '0' OR stemp(i) = 'L') THEN dtemp := temp; ELSE dtemp := (OTHERS => 'X'); END IF; ELSE IF (stemp(i) = '1' OR stemp(i) = 'H') THEN dtemp := (OTHERS => '0'); ELSIF (stemp(i) = '0' OR stemp(i) = 'L') THEN dtemp := temp; ELSE dtemp := (OTHERS => 'X'); END IF; END IF; temp := dtemp; END LOOP; B_shift <= dtemp; END PROCESS I1combo; END struct;9.1.5 FPinvert

LIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;ENTITY FPinvert IS GENERIC( width : integer := 29 ); PORT( A_in : IN std_logic_vector (width-1 DOWNTO 0); B_in : IN std_logic_vector (width-1 DOWNTO 0); invert_A : IN std_logic; invert_B : IN std_logic; A_out : OUT std_logic_vector (width-1 DOWNTO 0); B_out : OUT std_logic_vector (width-1 DOWNTO 0) );END FPinvert ;ARCHITECTURE FPinvert OF FPinvert ISBEGINA_out <= (NOT A_in) WHEN (invert_A='1') ELSE A_in;B_out <= (NOT B_in) WHEN (invert_B='1') ELSE B_in;

55

END FPinvert;

9.1.6 FPnormalize

LIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;USE ieee.std_logic_unsigned.all;ENTITY FPnormalize IS GENERIC( SIG_width : integer := 28 ); PORT( SIG_in : IN std_logic_vector (SIG_width-1 DOWNTO 0); EXP_in : IN std_logic_vector (7 DOWNTO 0); SIG_out : OUT std_logic_vector (SIG_width-1 DOWNTO 0); EXP_out : OUT std_logic_vector (7 DOWNTO 0) );END FPnormalize ;ARCHITECTURE FPnormalize OF FPnormalize ISBEGINPROCESS(SIG_in, EXP_in)BEGIN

IF (SIG_in( SIG_width-1 )='1') THENSIG_out <= '0' & SIG_in(SIG_width-1 DOWNTO 2) & (SIG_in(1)

AND SIG_in(0));EXP_out <= EXP_in + 1;ELSESIG_out <= SIG_in;EXP_out <= EXP_in;END IF;

END PROCESS;END FPnormalize;

9.1.7 Fpround

LIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;USE ieee.std_logic_unsigned.all;ENTITY FPround IS GENERIC( SIG_width : integer := 28 ); PORT( SIG_in : IN std_logic_vector (SIG_width-1 DOWNTO 0); EXP_in : IN std_logic_vector (7 DOWNTO 0); SIG_out : OUT std_logic_vector (SIG_width-1 DOWNTO 0); EXP_out : OUT std_logic_vector (7 DOWNTO 0) );END FPround ;ARCHITECTURE FPround OF FPround ISBEGIN

EXP_out <= EXP_in;

56

PROCESS(SIG_in)BEGIN (SIG_in(0)='0'))) THEN IF (SIG_in(2)='0') THEN

SIG_out <= SIG_in; ELSE SIG_out <= (SIG_in(SIG_width-1 DOWNTO 3) + 1) & "000";

END IF;END PROCESS;END FPround;

9.1.8 FPselComplementLIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;USE ieee.std_logic_unsigned.all;ENTITY FPselComplement IS GENERIC( SIG_width : integer := 28 ); PORT( SIG_in : IN std_logic_vector (SIG_width DOWNTO 0); EXP_in : IN std_logic_vector (7 DOWNTO 0); SIG_out : OUT std_logic_vector (SIG_width-1 DOWNTO 0); EXP_out : OUT std_logic_vector (7 DOWNTO 0) );END FPselComplement ;ARCHITECTURE FPselComplement OF FPselComplement ISBEGIN

EXP_out <= EXP_in;PROCESS(SIG_in)BEGINIF (SIG_in(SIG_width) = '1') THEN

SIG_out <= (NOT SIG_in(SIG_width-1 DOWNTO 0) + 1);ELSE

SIG_out <= SIG_in(SIG_width-1 DOWNTO 0);END IF;END PROCESS;END FPselComplement;

9.1.9 FPswapLIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;ENTITY FPswap IS GENERIC( width : integer := 29 ); PORT( A_in : IN std_logic_vector (width-1 DOWNTO 0); B_in : IN std_logic_vector (width-1 DOWNTO 0); swap_AB : IN std_logic; A_out : OUT std_logic_vector (width-1 DOWNTO 0);

57

B_out : OUT std_logic_vector (width-1 DOWNTO 0) );END FPswap ;ARCHITECTURE FPswap OF FPswap ISBEGINPROCESS(A_in, B_in, swap_AB)BEGIN IF (swap_AB='1') THEN A_out <= B_in; B_out <= A_in; ELSE A_out <= A_in; B_out <= B_in; END IF;END PROCESS;END FPswap;9.1.10 packfpLIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;ENTITY PackFP IS PORT( SIGN : IN std_logic; EXP : IN std_logic_vector (7 DOWNTO 0); SIG : IN std_logic_vector (22 DOWNTO 0); isNaN : IN std_logic; isINF : IN std_logic;

isZ : IN std_logic; FP : OUT std_logic_vector (31 DOWNTO 0) );END PackFP ;ARCHITECTURE PackFP OF PackFP ISBEGINPROCESS(isNaN,isINF,isZ,SIGN,EXP,SIG)BEGIN

IF (isNaN='1') THENFP(31) <= SIGN;FP(30 DOWNTO 23) <= X"FF";FP(22 DOWNTO 0) <= "100" & X"00000";ELSIF (isINF='1') THENFP(31) <= SIGN;FP(30 DOWNTO 23) <= X"FF";FP(22 DOWNTO 0) <= (OTHERS => '0');ELSIF (isZ='1') THENFP(31) <= SIGN;FP(30 DOWNTO 23) <= X"00";FP(22 DOWNTO 0) <= (OTHERS => '0');

ELSEFP(31) <= SIGN;FP(30 DOWNTO 23) <= EXP;FP(22 DOWNTO 0) <= SIG;END IF;

END PROCESS;

58

END PackFP;9.1.11 UnpackFPLIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;ENTITY UnpackFP IS PORT( FP : IN std_logic_vector (31 DOWNTO 0); SIG : OUT std_logic_vector (31 DOWNTO 0); EXP : OUT std_logic_vector (7 DOWNTO 0); SIGN : OUT std_logic; isNaN : OUT std_logic; isINF : OUT std_logic; isZ : OUT std_logic; isDN : OUT std_logic );END UnpackFP ;ARCHITECTURE UnpackFP OF UnpackFP IS

SIGNAL exp_int : std_logic_vector(7 DOWNTO 0);SIGNAL sig_int : std_logic_vector(22 DOWNTO 0);SIGNAL expZ, expFF, sigZ : std_logic;

BEGINexp_int <= FP(30 DOWNTO 23);sig_int <= FP(22 DOWNTO 0);SIGN <= FP(31);EXP <= exp_int;SIG(22 DOWNTO 0) <= sig_int;expZ <= '1' WHEN (exp_int=X"00") ELSE '0';expFF <= '1' WHEN (exp_int=X"FF") ELSE '0';sigZ <= '1' WHEN (sig_int="00000000000000000000000") ELSE '0';isNaN <= expFF AND (NOT sigZ);isINF <= expFF AND sigZ;isZ <= expZ AND sigZ;isDN <= expZ AND (NOT sigZ);SIG(23) <= NOT expZ;SIG(31 DOWNTO 24) <= (OTHERS => '0');

END UnpackFP;

9.2 Code for multiplierLIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;ENTITY FPmul IS PORT( FP_Am : IN std_logic_vector (31 DOWNTO 0); FP_Bm : IN std_logic_vector (31 DOWNTO 0); clkm : IN std_logic; FP_Zm : OUT std_logic_vector (31 DOWNTO 0) );END FPmul ;ARCHITECTURE single_cycle OF FPmul IS SIGNAL A_EXP : std_logic_vector(7 DOWNTO 0);

59

SIGNAL A_SIG : std_logic_vector(31 DOWNTO 0); SIGNAL A_SIGN : std_logic; SIGNAL A_isINF : std_logic; SIGNAL A_isNaN : std_logic; SIGNAL A_isZ : std_logic; SIGNAL B_EXP : std_logic_vector(7 DOWNTO 0); SIGNAL B_SIG : std_logic_vector(31 DOWNTO 0); SIGNAL B_SIGN : std_logic; SIGNAL B_isINF : std_logic; SIGNAL B_isNaN : std_logic; SIGNAL B_isZ : std_logic; SIGNAL EXP_addout : std_logic_vector(7 DOWNTO 0); SIGNAL EXP_in : std_logic_vector(7 DOWNTO 0); SIGNAL EXP_out : std_logic_vector(7 DOWNTO 0); SIGNAL EXP_out_norm : std_logic_vector(7 DOWNTO 0); SIGNAL EXP_out_round : std_logic_vector(7 DOWNTO 0); SIGNAL SIGN_out : std_logic; SIGNAL SIG_in : std_logic_vector(27 DOWNTO 0); SIGNAL SIG_isZ : std_logic; SIGNAL SIG_out : std_logic_vector(22 DOWNTO 0); SIGNAL SIG_out_norm : std_logic_vector(27 DOWNTO 0); SIGNAL SIG_out_norm2 : std_logic_vector(27 DOWNTO 0); SIGNAL SIG_out_round : std_logic_vector(27 DOWNTO 0); SIGNAL dout : std_logic; SIGNAL isINF : std_logic; SIGNAL isINF_tab : std_logic; SIGNAL isNaN : std_logic; SIGNAL isZ : std_logic; SIGNAL isZ_tab : std_logic; SIGNAL prod : std_logic_vector(63 DOWNTO 0); -- Component Declarations COMPONENT FPnormalize GENERIC ( SIG_width : integer := 28 ); PORT ( SIG_in : IN std_logic_vector (SIG_width-1 DOWNTO 0); EXP_in : IN std_logic_vector (7 DOWNTO 0); SIG_out : OUT std_logic_vector (SIG_width-1 DOWNTO 0); EXP_out : OUT std_logic_vector (7 DOWNTO 0) ); END COMPONENT; COMPONENT FPround GENERIC ( SIG_width : integer := 28 ); PORT ( SIG_in : IN std_logic_vector (SIG_width-1 DOWNTO 0); EXP_in : IN std_logic_vector (7 DOWNTO 0); SIG_out : OUT std_logic_vector (SIG_width-1 DOWNTO 0); EXP_out : OUT std_logic_vector (7 DOWNTO 0) ); END COMPONENT; COMPONENT PackFP PORT ( SIGN : IN std_logic ; EXP : IN std_logic_vector (7 DOWNTO 0); SIG : IN std_logic_vector (22 DOWNTO 0);

60

isNaN : IN std_logic ; isINF : IN std_logic ; isZ : IN std_logic ; FP : OUT std_logic_vector (31 DOWNTO 0) ); END COMPONENT; COMPONENT UnpackFP PORT ( FP : IN std_logic_vector (31 DOWNTO 0); SIG : OUT std_logic_vector (31 DOWNTO 0); EXP : OUT std_logic_vector (7 DOWNTO 0); SIGN : OUT std_logic ; isNaN : OUT std_logic ; isINF : OUT std_logic ; isZ : OUT std_logic ; isDN : OUT std_logic ); END COMPONENT; FOR ALL : FPnormalize USE ENTITY work.FPnormalize; FOR ALL : FPround USE ENTITY work.FPround; FOR ALL : PackFP USE ENTITY work.PackFP; FOR ALL : UnpackFP USE ENTITY work.UnpackFP; BEGIN SIG_in <= prod(47 DOWNTO 20); SIG_out <= SIG_out_norm2(25 DOWNTO 3); PROCESS(isZ,isINF_tab, A_EXP, B_EXP, EXP_out) BEGIN IF isZ='0' THEN IF isINF_tab='1' THEN isINF <= '1'; ELSIF EXP_out=X"FF" THEN isINF <='1'; ELSIF (A_EXP(7)='1' AND B_EXP(7)='1' AND (EXP_out(7)='0')) THEN isINF <='1'; ELSE isINF <= '0'; END IF; ELSE isINF <= '0'; END IF; END PROCESS; eb4_truth_process: PROCESS(A_isINF, A_isNaN, A_isZ, B_isINF, B_isNaN, B_isZ) BEGIN IF (A_isINF = '0') AND (A_isNaN = '0') AND (A_isZ = '0') AND (B_isINF = '0') AND (B_isNaN = '0') AND (B_isZ = '0') THEN isZ_tab <= '0'; isINF_tab <= '0'; isNaN <= '0'; ELSIF (A_isINF = '1') AND (B_isZ = '1') THEN isZ_tab <= '0'; isINF_tab <= '0'; isNaN <= '1'; ELSIF (A_isZ = '1') AND (B_isINF = '1') THEN isZ_tab <= '0';

61

isINF_tab <= '0'; isNaN <= '1'; ELSIF (A_isINF = '1') THEN isZ_tab <= '0'; isINF_tab <= '1'; isNaN <= '0'; ELSIF (B_isINF = '1') THEN isZ_tab <= '0'; isINF_tab <= '1'; isNaN <= '0'; ELSIF (A_isNaN = '1') THEN isZ_tab <= '0'; isINF_tab <= '0'; isNaN <= '1'; ELSIF (B_isNaN = '1') THEN isZ_tab <= '0'; isINF_tab <= '0'; isNaN <= '1'; ELSIF (A_isZ = '1') THEN isZ_tab <= '1'; isINF_tab <= '0'; isNaN <= '0'; ELSIF (B_isZ = '1') THEN isZ_tab <= '1'; isINF_tab <= '0'; isNaN <= '0'; ELSE isZ_tab <= '0'; isINF_tab <= '0'; isNaN <= '0'; END IF; END PROCESS eb4_truth_process; EXP_in <= (NOT EXP_addout(7)) & EXP_addout(6 DOWNTO 0); PROCESS(SIG_out_norm2,A_EXP,B_EXP, EXP_out) BEGIN IF ( EXP_out(7)='1' AND

( (A_EXP(7)='0' AND NOT (A_EXP=X"7F")) AND (B_EXP(7)='0' AND NOT (B_EXP=X"7F")) ) ) OR

(SIG_out_norm2(26 DOWNTO 3)=X"000000") THEN -- Underflow or zero significand SIG_isZ <= '1'; ELSE SIG_isZ <= '0'; END IF; END PROCESS; I4combo: PROCESS (A_EXP, B_EXP, dout) VARIABLE mw_I4t0 : std_logic_vector(8 DOWNTO 0); VARIABLE mw_I4t1 : std_logic_vector(8 DOWNTO 0); VARIABLE mw_I4sum : unsigned(8 DOWNTO 0); VARIABLE mw_I4carry : std_logic; BEGIN

62

mw_I4t0 := '0' & A_EXP; mw_I4t1 := '0' & B_EXP; mw_I4carry := dout; mw_I4sum := unsigned(mw_I4t0) + unsigned(mw_I4t1) + mw_I4carry; EXP_addout <= conv_std_logic_vector(mw_I4sum(7 DOWNTO 0),8); END PROCESS I4combo; I2combo : PROCESS (A_SIG, B_SIG) VARIABLE dtemp : unsigned(63 DOWNTO 0); BEGIN dtemp := (unsigned(A_SIG) * unsigned(B_SIG)); prod <= std_logic_vector(dtemp); END PROCESS I2combo; isZ <= SIG_isZ OR isZ_tab; dout <= '1'; SIGN_out <= A_SIGN XOR B_SIGN; I9 : FPnormalize GENERIC MAP ( SIG_width => 28 ) PORT MAP ( SIG_in => SIG_in, EXP_in => EXP_in, SIG_out => SIG_out_norm, EXP_out => EXP_out_norm ); I10 : FPnormalize GENERIC MAP ( SIG_width => 28 ) PORT MAP ( SIG_in => SIG_out_round, EXP_in => EXP_out_round, SIG_out => SIG_out_norm2, EXP_out => EXP_out ); I11 : FPround GENERIC MAP ( SIG_width => 28 ) PORT MAP ( SIG_in => SIG_out_norm, EXP_in => EXP_out_norm, SIG_out => SIG_out_round, EXP_out => EXP_out_round ); I5 : PackFP PORT MAP ( SIGN => SIGN_out, EXP => EXP_out, SIG => SIG_out, isNaN => isNaN, isINF => isINF, isZ => isZ, FP => FP_Zm ); I0 : UnpackFP PORT MAP (

63

FP => FP_Am, SIG => A_SIG, EXP => A_EXP, SIGN => A_SIGN, isNaN => A_isNaN, isINF => A_isINF, isZ => A_isZ, isDN => OPEN ); I1 : UnpackFP PORT MAP ( FP => FP_Bm, SIG => B_SIG, EXP => B_EXP, SIGN => B_SIGN, isNaN => B_isNaN, isINF => B_isINF, isZ => B_isZ, isDN => OPEN );END single_cycle;

9.3 VHDL CODE FOR NEURON

library IEEE;use IEEE.STD_LOGIC_1164.ALL;use IEEE.STD_LOGIC_ARITH.ALL;use IEEE.STD_LOGIC_UNSIGNED.ALL;entity neuron isPORT( ADD_SUB : IN std_logic; FP_An : IN std_logic_vector (31 DOWNTO 0); FP_Bn : IN std_logic_vector (31 DOWNTO 0); clkn : IN std_logic;

clk1n : IN std_logic; FP_Zn : OUT std_logic_vector (31 DOWNTO 0) );end neuron;architecture Behavioral of neuron isSIGNAL FP_Z1 : std_logic_vector (31 DOWNTO 0);SIGNAL FP_Z2 : std_logic_vector (31 DOWNTO 0);COMPONENT FPadd_single_cyclePORT( ADD_SUB : IN std_logic; FP_A : IN std_logic_vector (31 DOWNTO 0); FP_B : IN std_logic_vector (31 DOWNTO 0); clk : IN std_logic; FP_Z : OUT std_logic_vector (31 DOWNTO 0) );end COMPONENT;COMPONENT FPmul_single_cyclePORT( FP_Am : IN std_logic_vector (31 DOWNTO 0); FP_Bm : IN std_logic_vector (31 DOWNTO 0); clkm : IN std_logic;

64

FP_Zm : OUT std_logic_vector (31 DOWNTO 0) );end COMPONENT;FOR ALL : FPadd_single_cycle USE ENTITY work.FPadd; FOR ALL : FPmul_single_cycle USE ENTITY work.FPmul; BeginI0 : FPmul_single_cycle

PORT MAP ( FP_Am => FP_An,

FP_Bm => x"00000400",clkm => clk1n,

FP_Zm => FP_Z1 );I1: FPmul_single_cycle

PORT MAP( FP_Am => FP_Bn,FP_Bm => x"00000010",clkm => clk1n,

FP_Zm => FP_Z2);I2: FPadd_single_cyclePORT MAP(ADD_SUB => ADD_SUB, FP_A => FP_Z1,

FP_B => FP_Z2,clk => clk1n,

FP_Z => FP_Zn);end Behavioral;

9.4 Selection

library IEEE;use IEEE.STD_LOGIC_1164.ALL;use IEEE.STD_LOGIC_ARITH.ALL;use IEEE.STD_LOGIC_UNSIGNED.ALL;entity selection isport(ip1: in std_logic_vector(31 downto 0);ip2: in std_logic_vector(31 downto 0);

ip3: in std_logic_vector(31 downto 0);op: out std_logic_vector(31 downto 0));end selection;architecture setq of selection isbeginprocess(ip3)beginif (ip3(31)='0')thenop <= ip1;else op <= ip2;

65

end if;end process;end setq;

9.5 Sigmoid in vhdl

library IEEE;use IEEE.STD_LOGIC_1164.ALL;use IEEE.STD_LOGIC_ARITH.ALL;use IEEE.STD_LOGIC_UNSIGNED.ALl;entity sigmoid isport(ain: in std_logic_vector(31 downto 0);aout: out std_logic_vector(31 downto 0);clksig:in std_logic;add:in std_logic);end sigmoid;architecture Behavioral of sigmoid is COMPONENT FPadd_single_cyclePORT( ADD_SUB : IN std_logic; FP_A : IN std_logic_vector (31 DOWNTO 0); FP_B : IN std_logic_vector (31 DOWNTO 0); clk : IN std_logic; FP_Z : OUT std_logic_vector (31 DOWNTO 0) );end COMPONENT;COMPONENT FPmul_single_cyclePORT( FP_Am : IN std_logic_vector (31 DOWNTO 0); FP_Bm : IN std_logic_vector (31 DOWNTO 0); clkm : IN std_logic; FP_Zm : OUT std_logic_vector (31 DOWNTO 0) );end COMPONENT;component selection port(ip1: in std_logic_vector(31 downto 0);ip2: in std_logic_vector(31 downto 0);ip3: in std_logic_vector(31 downto 0);op: out std_logic_vector(31 downto 0));end component;signal FP_Z2: std_logic_vector(31 downto 0);signal FP_Z3: std_logic_vector(31 downto 0);signal FP_Z4: std_logic_vector(31 downto 0);signal FP_Z5: std_logic_vector(31 downto 0);signal FP_Z6: std_logic_vector(31 downto 0);signal FP_Z7: std_logic_vector(31 downto 0);signal FP_Z8: std_logic_vector(31 downto 0);FOR ALL : FPadd_single_cycle USE ENTITY work.FPadd;FOR ALL : FPmul_single_cycle USE ENTITY work.FPmul;FOR ALL : selection USE ENTITY work.selection;

66

beginFP_Z2 <= ain and x"7fffffff";i11: FPmul_single_cycle

PORT MAP( FP_Am => FP_Z2,FP_Bm => x"00000010",clkm => clksig,

FP_Zm => FP_Z3);i12:FPadd_single_cyclePORT map(

ADD_SUB => add, FP_A => FP_Z3, FP_B => x"00100010", clk => clksig, FP_Z => fp_z4 );

i14: FPmul_single_cyclePORT MAP( FP_Am => FP_Z4,FP_Bm => FP_Z4,clkm => clksig,

FP_Zm => FP_Z5 );

15: FPmul_single_cycle PORT MAP(

FP_Am => FP_Z5,FP_Bm => x"12345006",clkm => clksig,

FP_Zm => FP_Z6);i16:FPadd_single_cycle

PORT map( ADD_SUB => add, FP_A => FP_Z6, FP_B => x"00100710", clk => clksig, FP_Z => fp_z7 );

i17:FPadd_single_cyclePORT map(

ADD_SUB => add, FP_A => FP_Z6, FP_B => x"00100700", clk => clksig, FP_Z => fp_z8 );

i18 : selection port map(ip1 => fp_z7,ip2 => fp_z8,

67

ip3 => ain,op => aout );end Behavioral;

Chapter 10

Implementation of different gates on FPGA using ANN

The main part of our project is the implementation of different gates like AND, OR, NOR, NAND on FPGA board.The board that we have used in our project is XC3S500E-FG320 Xilinx FPGA .We use the four sliding switchs in the board and the output is indicated by the LEDs in the board.The weights and bias of the different gates can find either manually or by using software like MATLAB.For our project we use the manual method.The different weights and bias of each neuron are given below

GATES BIAS WEIGHTS

OR 111 010

AND 101 010

68

NOR 011 110

NAND 001 110

10.1 VHDL code for the neural gate

library IEEE;use IEEE.STD_LOGIC_1164.ALL;use IEEE.STD_LOGIC_ARITH.ALL;use IEEE.STD_LOGIC_SIGNED.ALL;entity allgate3 is Port ( a : in STD_LOGIC_VECTOR(1 downto 0); c : in STD_LOGIC; d : in STD_LOGIC; clk : in STD_LOGIC; f : out STD_LOGIC);end allgate3;

architecture Behavioral of allgate3 issignal biasand: std_logic_vector(2 downto 0):="101";signal biasor: std_logic_vector(2 downto 0):="111";signal biasnor: std_logic_vector(2 downto 0):="001";signal biasnand: std_logic_vector(2 downto 0):="011";signal wandor: std_logic_vector(2 downto 0):="010";signal wandor1: std_logic_vector(2 downto 0):="110";signal ane: std_logic_vector(2 downto 0);signal bne: std_logic_vector(2 downto 0);signal w: std_logic_vector(2 downto 0);signal b: std_logic_vector(2 downto 0);signal e: std_logic_vector(2 downto 0);beginane<= "00"&c;bne<= "00" & d;process(a,clk)begincase a(1 downto 0)is when "00" => w<= wandor; b<=biasor; when "01" => w <= wandor;

69

b <= biasand; when "10" => w<= wandor1; b <= biasnor;

when "11" =>

w <= wandor1; b <= biasnand;

when others =>

f <= '0';

end case; e <= w*(ane+bne)+b; if e < "000" then f<='0'; else f <='1'; end if; end process; end Behavioral;

10.2 Simulation result

70

Device utilization summary:

Selected Device : 3s500efg320-5

Number of Slices: 3 out of 4656 0% Number of 4 input LUTs: 5 out of 9312 0% Number of IOs: 6 Number of bonded IOBs: 5 out of 232 2%

Chapter 11

Simulation results and device utilization of FLOATING POINT NEURON

11.1.1 Device utilization summary:adder

Selected Device : 3s500eft256-5

Number of Slices: 398 out of 4656 8% Number of 4 input LUTs: 698 out of 9312 7% Number of IOs: 98 Number of bonded IOBs: 97 out of 190 51%

11.1.2 Device utilization summary:multiplier

71


Number of Slices: 147 out of 4656 3% Number of 4 input LUTs: 275 out of 9312 2% Number of IOs: 97 Number of bonded IOBs: 96 out of 190 50% Number of MULT18X18SIOs: 4 out of 20 20% 11.1.3 Device utilization summary:sigmoid


Number of Slices: 1520 out of 4656 32% Number of 4 input LUTs: 2718 out of 9312 29% Number of IOs: 66 Number of bonded IOBs: 65 out of 190 34% Number of MULT18X18SIOs: 12 out of 20 60%

11.1.4 Device utilization summary:neuron


Number of Slices: 674 out of 4656 14% Number of 4 input LUTs: 1223 out of 9312 13% Number of IOs: 99 Number of bonded IOBs: 97 out of 190 51% Number of MULT18X18SIOs: 8 out of 20 40%

11.2 Simulation results

11.2.1 Adder

72

11.2.2 Multiplier

11.2.3 Sigmoid

73

11.2.4 Neuron

11.3 Schematic of neuron

74

11.4 Schematic of sigmoid function

Chapter 12

75

CONCLUSIONS

Concepts of Artificial Neural network are studied. The results of this

initial study demonstrate that a neural network may be trained from data

provided by an optimal guidance system. The trained network performs in

a slightly sub-optimal manner - but has the advantage that it does not have to

re-compute controller parameters for different forward speeds. Using the

concepts of Artificial Neural Network a VHDL code for a floating point

neuron is written in the software XINILX ISE. The simulation results were

checked using the software. The VHDL code for a fixed point Neuron which

will act as different gates on the selection of different weights and biases is

written in XINILX ISE and the outputs is verified. It is then downloaded to

the SPARTEN 3E FPGA board using the software iMPACT, which will

convert the VHDL code to the corresponding bit stream. The outputs of the

FPGA are then checked by giving various inputs.

76

Documents

Imlementation of ANN on FPGA