Robust Hybrid VLSI Neural Network Architecture for Smart

A Robust Hybrid VLSI Neural Network Architecture for a Smart Optical Sensor

Hormoz Djahanshahi

A Thesis

Submitted to the College of Graduate Studies and Research

in Partial Fulfillment of the Requirements for

the Degree of Doctor of Philosophy

Electrical and Computer Engineering

University of Windsor

Windsor, Ontario, Canada

O 1998 Hormoz Djahanshahî

National Library B J c l .,ma Bibliothbque nationale du Canada

Acquisitions and Acquisitions et Bibliogrsphk Services rewices bibliogmphques 39s Wdlington Street 395, Ne Wellington ôtbwaON K1AON4 ôttawaON K 1 A W canada CPnada

The author has granted a non- exclusive licence aiiowing the National Library of Canada to reproduce, loan, distribute or sel1 copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fiom it may be printed or othenuise reproduced without the author's permission.

L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/nlm, de reproduction sur papier ou sur format électronique.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse N des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

C a n a !

Dedicated to the memory of my grandmother

who was my first teacher

Abstract

This thesis introduces a novel approach to the design of circuits found in a v e r - large scale

integration (VLSI) implementation of an artificial neural network. A robust hybrid architecture

with analog and digital elements has ken developed for a fully-parallel single-chip realization of

multilayer neural networks. The proposed architecture is highly modular and creates regular

silicon structures that well suit a VLSI realization. The architecture employs an innovative

universal building block consisting of an improved digital-analog multiplier, a new analog active

nonlinear resistor and a digital weight register. The key circuit called a unified synapse-neuron

allows one to realize a self-scaling sigrnoidal neuron characteristic that does not have to be

constantly redesigned to accommodate a varying dynarnic input range that is dependent upon the

number of synaptic weights connected to the input of the neuron. The effects of synaptic weight

quantization noise are also shown to be reduced using a stochastic mode1 developed in the thesis.

A new resistive-type neuron circuit is presented that exnibits inherently low characteristic

variations based on analyses, simulations and fabrication measurements. Moreover, as each

neuron is realized by a number of compact sub-neurons that are distributed over the die area, the

effects of process variations on the neuron's characteristics are minimized due to the distributed

averaging effect that takes place. Increased robustness is achieved as there is a simultaneous

reduction of both digital quantization effects and analog variation effects. The distnbuted nature

of the analog neuron also has the potential to contribute to increased fault tolerance for certain

types of neuron circuit failure. Circuit design. implementation and characterization are performed

in a standard CMOS process at SV and 3.3V supply voltages so as to lead to an optimized design.

The purpose for this research was to develop a smart nonîontact optical sensor based on a

programmable neural network with an integrated photosensitive array. The theoretical and

experimental work has lead to the design and realization of a highly modular and robust neural-

based smart CMOS sensor with reduced interconnection areas and increased synaptic density.

As a result. a larger photosensor array and a larger neural network classifier are implemented on a

restncted die area. Both theoretical and experimentai results are presented in the thesis.

Acknowledgements

I would like to thank my thesis advisor, Dr. William C. Miller, for his encouragement and

support throughout this thesis. 1 also wish to express my gratitude to Dr. Majid Ahmadi

and Dr. Graham A. Jullien for their help and support in various aspects of this research.

1 am gateful to the members of my thesis committee Dr. Fathi M. Salam (extemal

examiner) and Dr. Subir Bandyopadhyay (external reader) for their insightful comments

and suggestions.

Special thanks goes to Roberto Muscedere for his help with computer and software

problems during the last year of my work at the VLSI Research Group in Windsor.

He made a diffennce in our lab. To my friends Hossain Hajimowlana, Saeid Sadeghi,

Ramin Safari. Marjan Shahkariuni, Jinming Yang and many others, thanks for your

invaluable friendship and for the enjoyable research environment you have contributed to.

Last but not the least, I would like to thank my wife, Taban, for al1 her love, support and

understanding. She is a hero! And to my daughter, Kirnia, thanks for bringing joy to our

life. You are my greatest rhievement.

Uniwnity of Windsor

Table of Contents

List of F i ~ m aaaaaaeamaaaaaaeaameaaaaaaeammaaoaaaaaaaaamaaamaaaaaaaaaaaaaaaaaaaaaaaaeaamaamaaaaaoaea x

Chapter 1 Introduction aamammaaamaeaeaaaam~aaamaeemaamaaaaeaaaaaamaamamaaaaaamaaaaaaaaeaaaaaama 1 1 . 1 Overview .......................................... 1

1.1.1 Conventional vs . Neurd Cornputation ............................................ 2 .............................................................. 1.2 Neural Network Implementations 5

..................................................................... . 1.2.1 Software vs Hardware 5 .................................. 1.2.2 Optical and Optoelectronic Implemen tations 7 . f ................................................................... 1.2.3 Digital Implementations 8

................................................................ 1.2.4 Analog Implementations 10 1.2.5 Hybrid (Mixed-signal) Implementations ....................................... 12

1.3 Objectives ................................................................................................... 14 1.4 Thesis Organization ............................................................................... 15

Chapter 2 A New Hybrid VLSI A ~ ~ h i t e ~ t ~ ~ a a e a o e a a a a a a m a a a a m a e a ~ ~ a ~ . a 18 ................................................................................................ 2.1 Introduction 18

2.2 Some implementation Problerns ................................................................ 19 ...................................... 2.3 A New Hybrid Distributed-Neuron Architecture 21

2.3.1 Creating a Unified Synapse-Neuron (USN) ................................. . . 1 2.3.2 A Modular Neural Neiwork Implementation ................................. 24

........................................................ 2.3.3 Propenies of the Architecture 25 2.4 Conclusion ................................................................................................. 29

Chapter 3 Distributed Neuron md its P r o p e r t i e ~ a a a . e m a e e . . a a m m m a a e m a a a a a a a a a m 30 3.1 Introduction ................................................................................................ 30

............................................................ 3.2 Nonlineai Resistive-type Neuron 31 3.2.1 Circuit Description ........................................................................ 31

............. 3.2.2 Analysis of 1-V Chatacteristics .................................. 32 3.2.3 A Sensitivity Study ........................................................................ 38 3.2.4 Fabrications and Measurements ......................... .......*............*..... .. 39

........................... 3.3 Implementation and Properties of a Distributed Neuron 43 3.3.1 AnAveragingEffect ...................................... .. 3.3.2 A Self-scaling Property .................. .......... .... ........... ............... -49 3.3.3 AnIncreasedFaultTolerance .............*. ......*..........,..*... .............. 5 1

3.4 Conclusion .................. ...... ....... ............................................................... 52

.**..*....*..............*.***............ 4 1 Introduction -53 .................................. 4.2 A Programmable Universal Hy brid Building Block 54

....................................... 4.2.1 Multiplying D-to-A Converter (MDAC) 54 .................................... 4.2.2 Weight Register with Double-Phase Clock 61

4.2.3 Characteristics of the Unified Synapse-Neuron Circuit ................. 62 4.3 Applications ........................................

....................................... 4.3.1 An Optical Template Matching Network 64 4.3.2 General Purpose Programmable Neural Network Classifier ......... 69 4.3.3 Other NNIC Fabrications ............................................................. -71

4.4 Conclusion .............................................................................................. 71

Chapter 5 5.1 5.2

Chapter 6 6.1 6.2 6.3

Quantization Noise Improvement aaaaaaaaaemaaaa~aaaaamaaameaaaaaaaaaaaaa 72 Introduction ................................................................................................ 72 Modeling a Distributed Neuron ................................................................. 73 5.2.1 Increase in the number of Adaline inputs ................................... 74

................................................................ 5.2.2 Self-scaiing Formulation 77 Stochastic Mode! ..............* ......... ................................................................ 79 5.3.1 Sigrnoidal Adaline with Lumped Neuron ...................................... 79 5.3.2 Sigrnoidal Adaline with Distributed Neuron .................................. 81 A Case Study ........................................................................................... A 2 Discussion and Conclusion ........................................................................ 84

Neural-bawd Smart P ~ O ~ O S ~ ~ S O ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 86 ............................................................... ............................ Introduction ., 86

Objectives and Issues ................................................................................. 87 ............................................ Photosensor Array ................ ........ ..... ........... 88

6.3.1 CMOS-compatible Photosensitive Device .................................... .88 ........................ ............ 6.3.2 CMOS Photoreceptor Ce11 Circuit .. 3

A Review of Conventional Designs ........................................................... 92 6.4.1 Partially-connected Pm-programmed Neural-based Sensor .......... 93 6.4.2 Fully-connected Programmable Neural-based Sensor ................... 93

........................................... Distributed Neural-based Sensor Architecture 94 6.5.1 2-D Distributed-neuron Architecture ................... .. .................... -95

..................... .................. 6.5.2 Distributed Amy of Smart Pixels .... 96 6.5.3 Characteristics of the Neural-based Sensor Chip .......................... 99 6 S.4 S ynaptic Density and Interconnections ...................................... -99 6.5.5 Robustness of Neurons ......................... .. .............................. . 1 0 2

... 6.5.6 BiCMOS vs . CMOS Implementation ................................. 103 Training .................................................................................................. 104

......................*.....*...**.*....... .......*...............*...*.*...*..**..*.*..... Conclusion .... 106

Chapter 7 C O ~ C ~ U S ~ O ~ S 8 ~ 0 ~ ~ 8 0 8 0 8 8 8 ~ 8 8 8 8 8 8 0 0 8 ~ 8 ~ 0 ~ ~ 8 ~ ~ 0 8 ~ ~ 8 8 8 0 8 ~ ~ 0 0 0 0 ~ 8 ~ ~ ~ 0 ~ 8 8 8 8 ~ 0 0 108 Surnmary .. ......... .. . .. . ............. ..... .... . .... . ............. ...... ... .. .. .. . . ... ... . . . . .. .... . ... . .. 1 O8 Contri butions ... . .. . ... . .. . . . . . . . . . .. . . . .-. . . . . . . . . . .. . . .. . .. . . ....... . .... . .,, . . ., . .. 109 Suggested Future Research ...................................... . .. . . . ......... . 1 12

List of Tables

Table 3.1. Device sizes for the two neuron circuits shown in Figure 3.1 and simulated ...................................................................... ...................... in Figure 3.2 .. 33

Table 3.2. Regions of operation for the neuron circuit shown in Figure 3.I(b) ............... 35 .................... Table 3.3. A sumrnary of experimenta! results on (lumped) neuron circuit 42

Table 3.4. A summary of comparative measurements on lumped and distributed ......................................................................... ............................ neurons ....... 47

............ Table 4.1. Device sizes (Wn) in pm for MDAC circuit shown in Figure 4.2(c) 59

Table 4.2. A data summary about template-matching NNIC ....................................... 67

Table 4.3. A data surnmary about programmable NNIC classifier ................................ 70

Table 6.1. Cornparison between a conventional and a distributed neural-based photosensor design ....................................................................................... 102

..................................... Table 6.2. Training pattern set for the 8 x 8 smart photosensor 106

Figure 1.1

Figure 2.1

Figure 2.2

Figure 2.3

Figure 3.1

Figure 3.2

Figure 3.3

Figure 3.4

Figure 3.5

Figure 3.6

Figure 3.7

Figure 3.8

Figure 3.9

List of Figures

Taxonomy of Artificial Neural Networks ......................... .. ...... . ................ 6 a) Interconnections in a fullyconnected multilayer neural network, b) limitations on VLSI interconnections ...................... .......................... . 20

Evolutionary steps to create a unified synapse-neuron in a hybrid architecture .. ...... .... .................................. . . . . ....... . . . 22

Modular neural network implementations: a) a 4-3 neural network, b) a multilayer 4-3-2 neural network ..................... 26

Circuit diagram of nonlinear Eto-V neurons: a) original circuit [8 11, b) the modified circuit .............. + ............ .... . . . 33

1-V characteristics of the circuits in Figure 3.1 (a) and (b) ........................... 33

a) Regions of operation on S-shaped V-1 characteristic, b) current distribution in four MOS transistors ........................................... 34

Simulations with NI% variations on threshold voltage .............................. 39

Microphotograph of a fabricated CMOS chip that includes lumped neurons, unified synapse-neurons and other test circuits ............................. 40

Measured neuron characteristics from 10 fabricated chips: a) overlaid results, b) a close-up view of maximum chip-to-chip variations 40

Implementation of a distributed neuron .......... ........ .... . ...... . ... . . . . . . 44

Neuron cells in a gradient of doping: a) lumped realization. b) distributed realization ........................................ . 45

Onchip variations of characteristics (worst case among 5 fabrications): a) lumped neuron implementation, b) distributed neuron implementation .. 48

Figure 3.10 Self-scaling property of the distributed neuron circuit: a) simulated characteristics of a sub-neuron and a 5-input neuron, b) experimental results comparing a 2-input and a 5-input neuron .............. 5 1

Figure 4.1 Sub-blocks of the universal hybrid building block ........................ ... .... ........ 54

Figure 4.2 MDAC-type synapse, evolutionary steps: a) a conceptual block diagram, b) a 5-bit version based on [57],

* . . . . . C) mdificat~on in sign-bit circuit, ..................................... ... ...................... 56

Figure 4.2 Continued d) modification in current mirrors, e) modification in V-to-1 converter... ...., 57

Figure 4.3 MDAC output current (at Vin.-) W. binary weights: a) simulation waveforms, b) fabrication meamrements ............ .............. ..... 58

Figure 4.4 Improving the dynamic range of but vs. Vin in MDAC: a) threshold reduction (circuit improvement h m Figure 4.2(c)

to Figure 4.2(d)), b) linearization (circuit irnprovement from Figure 4.2(d) to Figure 4.2(e)). 60

Figure 4.5 Schematic of one bit of a 5-bit weight stonge cell with double-phase dock ............................................................................................................ 61

...................................................... Figure 4.6 Unified S ynapse-Neuron (USN) circuit 62

F i ~ r e 4.7 Output current of the modified MDAC (Figure 4.6 or Figure 4.2(e)) measured through a linear load ..................................................................... 63

Figure 4.8 Overall characteristics of USN: a) simulations for two parametrie values of Vin. ..... ............................................ .. b) expenmental measurements Vi SV 63

Figure 4.9 A 4-3-2 VLSI neural network based on arrays of a univenal hybrid building block ......................... .. .............................................................. 64

Figure 4.10 Template matching: a) optical inputs. b) equivalent electronic inputs . c) chip outputs in recall .............................................................................. 66

........ Figure 4.1 1 a) Layout and b) core microphotograph of template matching NNIC 67

Figure 4.12 a) Border feature extraction and b) directional templates for handwritten numenl recognition ............................................................................... 69

Figure 4.13 Microphotograph of a general purpose 16-4-3 programmable W I C classifier ............................................................................................. 70

Figure 5.1 An Adaline implemented with lumped resistive-type neuron ...................... 73

Figure 5.2 An Adaline with a distributed-neuron architecture .................................. 74

Figure 5.3 a) An N-input neuron characteristic over the original range of inputs. b) the same neuron when inputs are increased to N.S, c) a properly-scaled neuron with N.S inputs ................... .. ....................... 76

....................... Figure 5.4 Neuron input increase for a distributed neuron .... ...... 78

......... ............... Figure 5.5 Stochastic mode1 for an Addine with distributed neuron .. 82

. ...................... Figure 5.6 Signal-to-Noise Ratio improvement vs input increase factor 84

Figure 6.1 Top view. cross section and device equivalent model of: ... a) vertical photoB JT. b) Field-Effect Modified (FEM) vertical photoBJ T 9 1

Figure 6.2 Photosensor cell circuit .......................................................................... 92

................. Figure 6.3 Neural-based photosensor for focal-plane pattern classification 95

Figure 6.4 Hybrid distributed-neuron architecture on a 2-D array .................... ., ...... 96

Figure 6.5 Neud-based smart pixel: a) a schematic diagram. b) die micmphotograph of two adjacent pixels ..... 97

............... Figure 6.6 Floorplan of the neural-based photosensor chip .. ................. 101

Figure 6.7 Microphotograph of the neural-based photosensor chip ............................ 101

Figure 6.8

Figure A . 1

Figure A.2

Figure A.3

Figure A.4

Figure AS

Figure A.6

Figure A.7

Figure A.8

Figure A.9

Various pop-up windows in Training IRecall simulator: a) main window. b) about features. c) define structure (graphic window not shown). d) define input/output patterns. e) define and run training. f) simulated recall ...................................................................................... 105

. . Layout of a sub-neuron circuit ................................................................. 123

........................... A group of five sub-neurons with a common bias circuit 123

.............. Two layouts for a 5-bit MDAC synapse with cascode transistors 124

........... Layout of a 5-bit non-cascode MDAC synapse for 3.3V operation 124

Layout of a Unified Synapse Neuron (USN): a) with cascode MDAC.

.*.......*................ ......... b) with non-cascode MDAC for 3.3V operation .. 125

Layout of a 5-bit parailel-in parallel-out (PPO) weight register ............... 125

WRNHT: a test chip containhg distributed neurons . MDACs and USNs (see page 40 for a microphotograph) ......................................................... 126

WRNBS: &input template matching NNIC with opticallelectronic .............................................. inputs (see page 67 for a microphotograph) 126

WRPNN: a generai purpose programmable vector classifier NMC (see page 70 for a microphotograph) ................ ........................... 127

Figure A . 10 Layout of a neutai-based smart pixel (in WRNSS) consisting of a photosensor cell. 8 USN blocks. 8 x Sbit weight registers. clock driver and bias (see page 97 for a microphotograph) ....................... 127

Figure A . 1 1 WRNSS: neural-based srnart photosensor chip .......................................... 128

Figure A . 12 Neural-based photosensor chip in a 68 Pin Grid Array package ................ 128

Chapter 1

Human beings have long k e n fascinated by the operation of

biological neural systems. Intelligent behaviors such as

understanding, reasoning, vocal communications, vision, decision

making and locomotion are al1 attnbuted to the nervous systern and

its complexity and fineness. Besides basic interest in the

understanding of the physiologicd phenornena. it is highly

desirable for many applications to realize features similar or close

to those performed by biological neural systems.

In recent years, much research has been put into deriving ideas

from biological paradigrns in order to make intelligent systems.

The objective of modem neural network research has been to

understand different aspects of the biological counterpart, as well as

to realize useful artificial neural systems. An actificial neural

network (ANN). or simply neural net (NN), is a processor whose

design is motivated by biological neural systems. Neurai networks

are charocterized by a massive number of interconnections and

nodes that collectively perform a parallel distributed processing

task based on a host of simple computational elements.

Key properties of neural networks are their fault tolerance due to interconnection

ndundancy, and more importantly their ability to learn from examples and later recul1 and

generalize even in noisy conditions. Applications of neural networks are generdly in

areas that humm brain may outperform conventional computers. Examples are, vision,

speech recognition and synthesis, pattern classification. handwritten character recognition,

medical diagnosis and expert systems, trend prediction, intelligent control and robotics.

Neural network research is an interdisciplinary field that has evolved from the interaction

of several disciplines including neurobiology, physics, mathematics, psychology,

cornputer science and electrical engineering. In the l95O's, Rosenblatt introduced his

Perceptron mode] [74]; however, it was shown that the perceptron was incapable of

solving the parity (EXOR) problem. Presenting new understandings of existing ideas in

neural networks, J. Hopfield revived interest in this field in the early 1980's. Hopfield

showed that several interesting properties emerge from the collective behavior of neumns

and synapses [38]. The new intenst in neurd networks in the past ten years or so has been

sparked in part by new models and training dgorithms, as well as advances in

rnicroelectronics that allows hardware or software implernentation of fairly large

networks. Carver Mead [83], [53] and the othea [3]. [3 11, [32], [33], [85] demonstrated

that ANNs with reasonable size can be implemented on VLSI chips that show the

predicted collective behavior, and mimic some features of, biological neural networks.

Some ANN implementations have shed new light on understanding the operation of the

biological counterpart [53].

1.1.1 Conventional vs. Neural Computation

Most of the computers that are part of our everyday lives fdl into a category of digital

machines with the von Neumann architecture. They represent information as 1s and Os

and perfom computations using a central processing unit (CPU) connected to a memory

bank. In a von Neumann machine a CPU based on a pdefined algorithm accesses data in

memory, perforrns some operations on the data and stores the results back into memory.

With ever increasing complexity of the tasks that cornputen are required to perforrn,

numerous techniques have k e n pursued to enhance their computational power. Major

improving methods are summacized below.

Terhnological Advances: There has ken a continuous trend in incremental

developrnent of both faster and more complex processors. and faster and larger

memories. The minimum feature size on a chip has ken shrinking while the number of

transistors per chip and the dock speed have been increasing steadily. However, we

will inevitably be approaching a point where physical limits on feature sizes, etc. will

be reached.

Miscellaneous Organizational Improvements: Some exarnples of organizational

improvements to speed up computation flow include pipelining architectures, Reduced

Instruction Set Cornputers (RISC machines), and hierarchical memories e.g. a pyramid

of built-in CPU registen, cache memory, dynamic rnemory bank and rnass storage.

Pardlel Pmessing: A major gain in computational power is achieved by adopting

rnultiprocessor architectures. In practice, an increase in the number of processors does

not necessarily result in a proportional increase in computing power. One issue in

parallel processing is how to divide a task into pieces that can be perfonned in parallel,

rather than serially. Other issues are how to organize memory resources and how to

communicate among an increased number of processors. A My-interconnected

communication scheme would result in an overly complicated hardware. Instead, in a

'hypercube' architecture, each processor placed at the corner of a hypercube is allowed

to comrnunicate directly only dong the cube edges. while in a 'systolic array', the

processors placed on a grid communicate only with the nearest neighbors on the array.

Commerciaily avdable multiprocessorl multicomputing systems are application

dependent and Vary from those like the Cray Y-MP with eight powerful processors to

the Connection Machine with 65K single-bit SIMD' processors [36].

1- SIMD: Single Instruction Multiple Data

A neural network is yet another alternative on the specinim of computational

architectures, one that looks closer to the Connection Machine, although it is different in

nature [95], 1491.

Progranunhg Algorithm: Proper programming is an important part of a problem-

solving computer system. As long as a problem and its solution are well understood

and modeled, it is always possible to develop an algorithm and a program to solve the

problem, no matter how complicated. However, there are iüsks that have not yet been

well undentood and implemented on computers, among those are problems for which

good algorithmic solutions are not known. Pattern recognition and machine vision are

examples of such sophisticated problems. As such problems are routinely solved in

nature by some simple life fonns, let alone human beings, many researchers seek a

solution in neurobiology. They h o p to mode1 their understanding of biological neural

computation toward a non-algorithmic solution for the mentioned types of problems.

Neural networks represent an alternative paradigm that lies at one end of a computational

spectnim with von Neumann machines king at the other end. Multiprocessor

architectures and systolic arrays lie in between. Professing in von Neumann computers is

centralized (in CPU) and perfomed in serial. The power of these computers comes from

the superb accuracy and speed of their computational elements that operate based on

predefined algoriihms. On the other hand, neural processing is highly distributed

(decentralized) and is perfonned by many simple computational elements. The power of

neud computations comes from massive number of elements concurrently performing a

collective task. Moreover, neural computation is non-algorithmic and model-free.

In terms of a realistic future outlook, one should not expect that neurd networks will

replace conventional digital computers. The basic reasons are that conventional

computers are nowadays very inexpensive to make and extremely accurate and fast in

executing numerical calculations, text and data processing, computer aided design, etc.

Perhaps the most important applications of neural networks are those involving

classification, association [98] and computationally intensive yet seemingly non-

University of W~ndsor

dgorithmic problems, sometimes termed as perceptive computations [88], that are not

successhilly attacked by conventional computen.

Neural networks have already been known as good pattern classifiers 150). They have also

played an important mle in the field of document image anûlysis, especially in commercial

optical character recognition (OCR) systems [67], [76], [6]. Recently neural networks

have been successfully used in applications such as intelligent control of industrial plants

[4], self-calibration of commercial space robots [75], on-line fraud detection of credit card

operations handling 1 million transactions per month [28] and automotive control (1 11,

[70]. Prototype neural chips with temporal learning capability have been presented

recently for red-time tracking and control applications 1771, 1781. Other potentid

applications are in the areas requinng human-like inference and perception of speech and

vision. especially in real-time systems [98]. To our civilization that is already heavily

dependent on conventional digital computers. future artificial neural networks will offer a

key complementary technology rt their best.

1.2 Neural Network Implementations

A taxonomy of artificial neural network implementations is shown in Figure 1.1.

The main categories shown in this figure are descnbed below.

1.2.1 Software vs. Hardware

Neural networks have ken implemented widely on software platfonns. Software

implementations are neurosimulators run on conventional computers. The first computer

simulation of a neural network was perfonmd by Rochester in 1956 on a Hebbian

leaming network [73]. Ironically, the first estimates for the capacity of the brain and the

notion of imprecise neural computation was suggested by von Neumann [87]. Nowadays,

complicated neurosimulatoa are widely available on personal computers and

workstations. Fiexibility is a main advantage of these simulators as they can be used for

various neural network topologies and training algorithms. Parameters such as the number

of Iayers, the number of neurons per layer and the type of neuron nonlinearity can be

easily changed by user and be explored based on application demand. Neurosimulators

have various user-fnendly graphical and text interfaces that give user a quick insight on

the performance of hisher simulated network mode1 or training scheme. In fact, a great

deal of the recent public interest in neural networks is indebted to advances in cornputer

technology that made it possible to simulate large networks in a flexible, interactive and

inexpensive manner.

Figure 1.1 Taxonomy of Artficial Neural Networks

On the other hand, the use of a serial cornputer for implementing a neural network seems

to be somewhat paradoxical. The nature of computation in a digital serial computer is

very different €rom that of a neural network as explained earlier in Section 1.1.1.

Although software simulators are of great significance, it is in fact a hardware

implementation based on many parallel computational elements that can tmly exploit

inherent speed and properties associated with parailel distributed pmcessing of neural

networks. Simulations of neural networks in real-world applications (as opposed to

simple problems like EXOR) are computationally intensive to the degree that the

processing bottleneck on conventional cornputers limits practical explorations of large

networks. Many such applications require architectures composed of several dozens or

hundreds of neurons (summation and nonlinearities) connected via thousands of synapses

(multiplications). The training of these networks, such that they really generalize and not

just memorize the training data set, requires large data sets while training is usually a slow

and iterative process. Moreover, many of the mal-world applications are time-critical in

which the neural network has to be used 'on-line' [35]. For such applicaâions where huge

arnounts of calculations are to be perfomied in a very limited time, software simulations

are inadequate and thus hardware implementations of neural networks become inevitable.

1.2.2 Optical and Optoelectronic Implernentations

On hardware. neural networks have been implemented by a variety of optical or electronic

techniques or a combination of both. The main advantages offered by spatial optics are

high-speed processing and the possibility of fully parallei synaptic connections in three

dimensions. As we know, the tremendously complex biological synaptic connections are

made in three dimensions, while microelectronics virtually offers a 2.5 dimensional

implementation, i c . r limited 2-D interconnection possibility as well as a stack of just a

few conductive layea in third dimension. Despite the possibility of 3-D interconnections,

pure optical techniques tend to be more bulky and expensive, and less flexible than

microelectronic counterparts. Some modem optoelectronic neural networks take

advantage of a combinatian of both techniques.

An optoelectronic learning chip from Mitsubishi uses variable sensitivity photodiodes

(VSPDs) with metal-semiconductor-metal structure. The photosensitivity of this device is

a hnction of an applied bipolar bias voltage. In this way, a two-quadrant multiplier was

obtained to implement an electrically programmable synapse. A linear array of 8 LEDs

was used for input lines that was stacked on a 2-D 8 x 8 VSPD array. Based on this

architecture, an 8-8-3 multilayer perceptron with 640M updates per second was realized.

The use of 2-D optical arrays for photo-activated synaptic multipliers had been studied in

the VLSI Research Gmup at Windsor in the late 1980's [9]. A three-layer perceptron

network based on combined use of LCD devices and an amorphous-silicon

photoconductor array was implemented in Xerox research center [84]. Numerous neural

network architectures based on free-space optics and optoelectronics have been reported

in [8] and [91]. Most of the optoelectronic neural network implernentations are bsed on

devices with special fabrication processing (721, [84] that makes them more expensive

compared to fewer implementations based on standard CMOS or BiCMOS technologies

Integration of a photosensitive array acting as optical input nodes to a neural network

allows a fast fully-parallel application of large input vectors without pin limitations or

multiplexing delay. This approach has k e n investigated in standard CMOS by our group

[12], [42] and is explored hinher in Chapter 6 of this thesis.

Electronic neural networks can be divided into three groups: digital implementations,

analog implementations and hybrid (mixed analog-digital) implementations.

1.23 Digital Implementations

Digital implementations of neural networks can further be subdivided into two groups,

namely, neuroprocessors and dedicated chips:

Neumpmessors: Neuroprocessors (or neuro-accelerators) ore speciai purpose co-

processors for neuro-simulators. Aimed at accelerating the performance of a neum-

simulator program, a neuroprocessor generally cornes on a special board that fits in a

dot of a PC or workstation. This approach combines an accelerated speed with the

benefits of a flexible user-friendly software environment. Exarnples of commerciaily

available proâucts are, Mark II and Mark IV from TRW and Synapse I from Siemens

for VME-based workstations.

Dedicated chips: Dedicated digital neural network chips exploit more parallelism and

achieve speeds that are typicaily one or two orders of magnitude higher than computer-

based neutoprocessor boards. As a rnatter of fact. a large portion of commercially

available dedicated neural network chips are realized as digital circuits. The main

reasons are high precision and flexibility combined with the availability of mature

digital design tools. A major drawback of digital implementations is the large area and

power consumption per functional unit (multiplication, addition and nonlinear

lunctions). The growing complexity practically limits the size of a hilly parailel digital

implementation. In moderate to large designs. computational units. especially synaptic

multipliers. are shared in a time-multiplexing scheme. An obvious outcome of a

multiplexed solution is a slow down in overall speed proportional to the multiplexing

factor. The reductions in area and power do not usually match the multiplexing factor

due to the overhead created by multiplexing lcontrol circuit. Therefore, the power-

delay product becornes even higher in a multiplexed scheme.

Ford Motor Company successfully developed a prototype dynmic controller based on

recurrent neural networks for on-vehicle idle speed control[70]. Training and recall in the

prototype is based on an extemal digital computer, but the recall may easily be executed

directly in the vehicle's powertrain control module (PCM). Neural network control has

also been applied to anti-lock brake systems (ABS) [l I l .

Lneuro 1.0 from Philips is a general purpose building block processor which has 16

Processing Elements ( P b ) with 16-bit resolution each [52]. On-chip weight memory is

liU3 which is enough for 512 weights of 16-bit, or 1024 weights of 8-bit resolution. The

sigmoid hinction is not impiemented on chip. The speed per chip is 1OOM connections per

second, but in each cycle only one neuron output is available.

The Connected Network of Adaptive Rocessors (CNAPS) chip from Adaptive Solutions

consists of 64 parallel Frocessor Nodes (PNs) each containing an adder. a multiplier, 32 x

&bit registers, 4KB weight memory and bus drivea [34]. Sixty four neurons and a total

of 128K 16-bit weights can be implemented on chip for a total speed of 8ûûM connections

per second. Several CNAPS chips can be cascaded on a common bus controlled by a bus

arbiter chip. Howevcr, this approach is not suitable for large networks in which case PNs

would spend more time on waiting for bus availability than for computing.

Hitachi reaiized a 5-inch Wafer Scale Integration (WSI) neural network consisting of 48

chips, with 12 nine-bit neurons per chip and 64 eight-bit weights per neuron. Each of the

576 neurons has only one multiplier which performs one 8 ~ 9 b i t multiplication in about

0 . 5 ~ ~ . The whole wafer perfoms 1240M connections per second. The physicül size, iis

well as proportional power, of such implementation by itself should be an indication of

overwhelming complexity of a digital neural network realization.

Analog implementation seems to be a natural choice when neurobiology is considered as a

mode1 for artificial neural networks. A biological neural network consists of numerous

imprecise elements performing a collective task in p d l e l . Fundamental considerations

show that analog processing is more efficient (than digitai) with respect to power and chip

area when low precision is acceptable [SS]. This is the case for perceptive processing in

which the need for precise individual cells is replaced by that for collective computation in

a massively parallel architecture. Vision is an example of a domain to which anaiog

collective computation is imrnediately applicable, since the information to be processed is

inherently massively pmllel and can be acquired easily by an array of on-chip light

sensors [88], [89].

Basic operations required in a neural network can be performed by simple analog circuits

resulting in a dense asynchronous realization. The advantages of analog circuits in the

context of neural network implementation are small area, high speed and low power

consumption. Analog MOS circuits in subihreshold (weak inversion) region especially

offer ultra low power and thus a possibility for denser and larger neural network

redizations [53], [90]. The main problem of analog implementations is their inaccuracy

as anaiog components are subject to mismatch, offsets and gain errors due to fabrication

process variations. Analog circuits are also more susceptible to noise, temperature effects

and power supply variations.

The effect of inaccurate analog components in a neural network chip can be compensated

to some extent during a learning process that takes into account the acnial characteristics

of hardware (e.g. in a chip-in-loop or onîhip training scheme). Nonetheless, the effects

of implementation errors and inaccuracies become more apparent at the output of larger

neural networks based on a stochastic study [68]. in iiddition, it should be noted that a

high precision is required in the training phase and it is only in the recall phase that a

lower precision can be tolerated. For example the popular back-propagation algorithm

relies on relatively high resolution implementations (at least 9-10 bits in digital domain

[93], or equivalent analog accuracy) and appears to be especially sensitive to offsets

present in an analog implementation (141.

Another problem toward a My-analog neural network implementation is that of analog

storage of synaptic weights. A possible solution is to use onthip capacitors to store the

analog quantity as a charge deposit [40]. This is a volatile storage scheme that requires

initialization and periodic refreshing through D/A converters connected to digital

mernories on or off chip. Therefore, it eventually relies on a digital storage mechanism. A

non-volatile solution, but one that usually requires a special fabrication technology, is to

store charge on floating-gate devices and use them in Programmable Read-Only Mernories

(EEPROM, EPROM, ...) in an analog manner [65], [82], [2]. Programrning and update

cycles of analog EEPROM are relatively slow and typically involve high voltage pulses.

Moreover, EEPROM suffea from aging problems: (i) it can be reprogrammed only for a

limited number of times before starting to degenente and (ii) during long term storage,

then is a srnall charge ieakage in the order of a few percents per several rnonths. Finally,

analog EEPROMs have r limited accuracy that can not be increased arbitrarily, partly

because they are based on structuns optirnized for commercial digital technologies.

Perhaps the most well-known commercial analog neural network chip is the Electrically

Trainable Analog Neural Network (ETANN) 80 170 from htei 1371. The ETANN consists

-- - -

Introduction N e d Nd- Implemntprions 1 t

University of Winhr

of 64 neurons and 10,240 synapses based on fdly parallel Gilbert multipliers. The

neurons can be time-multiplexed in order to implement two layers of size 64 x 64.

Synaptic weights are programmed and stored on analog EEPROM cells. The training of

the chip is supported by the Intel Neural Network Training System ( i N m , a board ihat

includes D/A and A/D converten and PC interfacing capability. The combined accuracy

of analog neurons and synapses is equivalent to 6-bit resolution [86]. The overall recall

speed is about 2G connections per second. Despite high-speed recall, the training process

is slow due to the long programming cycle of analog EEPROM as well as interfacing

between the PC host computer, i N ' S board and ETANN chip. Training is usually

performed in two phases. First a set of approximate weights from an off-line training is

downloaded into ETANN. In the second phase, weights are fine-tuned in a chip-in-loop

training scheme to compensate for the limited accuracy and nonidealities of analog

hardware. Due to the elaborate training process and an aging problem of EEPROMs

mentioned earlier, ETANN is not suitable for applications in which frequent

reprogramming is requi red.

Another analog product is the Artificial Neural Network ALU (ANNA) from AT&T [SI.

ANNA has 8 neurons, 40% weights and 512 multipliers organized as 8 groups of 64

synapses working fully in parallel. A group-multiplexing scheme cm be used to create

256-input neurons, one neuron at a time. Synaptic weights are stored as charge deposits

on capacitors nfreshed by on-chip DIA converters. At maximum speed, ANNA performs

LOG connections per second. The training process however should be performed off-chip.

as is the case for most of the available neural network integrated circuits.

1.25 Hybrid (Mixed-signai) Implementatioas

Both analog and digital implementations of neural networks have their own advantages

and disadvantages as explained before. For this reason, sorne researchers have been

seeking a compromise in a hybrid (mixed-signal) implementation by exploiting the merits

of both analog and digital worlds. For example, it is beneficial to use f ~ e addition of

analog currents as well as compact nonlinear amplifiers to implement neuron function in

analog domain, w hile weights are stored conveniently in digital memories with arbitrary

precision. A multiplying DIA converter, in this case, can serve as a synaptic unit.

Pulse stream encoding, first reported in the context of VLSI neural networks in 1987 [59],

is another hybrid approach that attempts to blend the advantages of analog and digital

technologies. Inspired from biological communications in nerves, the States of neurons in

this technique are represented by a sequence of pulses with variable rates. Practical

techniques include pulse width modulation (PWM), pulse code modulation (PCM) ünd

sigma-delta modulation 1561, [29], [6 11. Multiplication is performed in the analog domain

under digital control while the final output is generated as a digital pulse stream which is

more robust against noise and easier to transmit [60]. Switched capacitors and analog

systolic arrays with digital processing capabilities are among other hybrid neural network

implementations [58], 1801.

A hybrid architecture based on time-multiplexing of a synaptic unit was devrloped in the

VLSI Research Group at Windsor [63]. This architecture reduces the complexity of

synapses and interconnections from o(N') to O(N), where N is a measure of the number of

nodes (inputs and neurons) in a network. The penalty is a reduction in the speed of

network operation in each layer by the factor of multiplexing. The overall speed reduction

K factor in a K-layer network is n i , where ni is the number of neurons in Iayer i.

Another problem in a multiplexed-synapse architecture, not often discussed in the open

literature, is an unwanted reduction in hardware redundancy and hence in the inherent

fault tolerance of neural network chip. If the multiplexed synapse fails to operate, e.g.

due to a VLSI defect or a burst of noise, al1 of the neurons in the next layer receiving an

input from that synapse will be aff'ted. In other words, a fault will be propagated to al1

nodes in the next layer via the multiplexer.

To improve the speed performance of the hybrid multiplexed neural n e ~ k , a pipelined

architecture bascd on neurons with embedded analog latches was suggested later [9q,

[64]. The pipelined architecture increases speed performance by a factor of two.

However, the improvement only compensates a small fraction of speed loss originally

incurred by time multiplexing scheme. For erarnple, a multi-layer 36-20-15-10 network

implemented based on the multiplexed architecture [63] is (20 + 15 + 10) = 45 times

slower while a pipelined multiplexed architecture is still 45/2 = 22.5 times slower than

a parallel implementation. Another practical problem in implementing the pipelined

neural architecture is its dependency on 'analog' latches. Moreover, fault tolerance still

remains an issue because of the multiplexing scheme.

1.3 Objectives

The VLSI Research Group at the University of Windsor has been pursuing a focused

activity towards the design and synthesis of rnassively pxallel processors for very high

speed compuiations. Among other computational paradigms, VLSI neural networks have

k e n a distinct ciass in this line of research* On the other hand, there h a been industrial

collaborative research projects at the university demanding the design of smart

photosensors for intelligent manufacturing control and machine vision applications.

This thesis is motivated by both the academic and industrial sides. It addresses severai

basic problems in the m a of VLSI implementation of neural networks and presents proof-

of-concept designs, from architectural level to circuit implementation and expenmental

verification. for the applications of interest. While the potential benefits of a hybrid

analog-digital implementation are acknowledged. we would like to avoid the

disadvantages of a multiplexed architecture as describcd in Section 1.2.5 in an attempt to

realize a fully-parailel neural network architecture.

The objectives of this research based on the above background are:

To develop a robust, highly-modular and fully-parallei hybrid VLSI architecture for the

implementation of multilayer neural netwodc classifiers,

To present novel building blocks for the proposed architecture through circuit design at

transistor-level. fabrications and experimental characterizations.

To implement in a standard VLSI technology:

(0 programmable neural network ICs for general pattern classification applications

(e.g. in numeral recognition),

(ii) a programmable neural-based smart photosensor for on-line classification of

optical input patterns in a manufacturing process control.

This thesis is organized in seven chapters. The present chapter provides an overview of

artificial neural networks and their applications. It compares the notion of collective

neural computation with that in conventional digital cornputers and emphasizes on neural

network as a complementary tool for those areas weakly handled by conventional

cornputers. Vdous neural network implcmentation techniques are investigated. including

analog, digital. hybrid and optoelectronic methods. and several research / commercial

products are surveyed.

A modular hybrid VLSI architecture for the implementation of neural networks is

presented in Chapter 2. It is the tint 'hybrid' architecture that implements multilayer

neural networks with a single universal block based on a distributed neuron and a digital

programmable synapse. Sevenl properties of this architecture are highlighted and the

main problems tackled are descnbed. Most of the properties briefed in Chapter 2 will be

discussed in details through analyses, simulations and experimental work in Chapten 3

In Chapter 3, a compact yet robust neuron circuit is presented that combines nonlinear 1-V

characteristics of NMOS and PMOS transistors to synthesize a sigrnoid-like saturating

function. The saturating function is studied analytically and a sensitivity analysis is

followed. Circuit analysis and simulation both indicate an interestingly low characteris~ic

Uniwnity of Windsor

variation despite considerable process parameter variations. Chip fabrications and

measunments, both onthip and chip-tothip, verify the robustness of the circuit

characteristics. Following the study of a lumped implementation, a distributed

implementation of the neuron circuit is presented. It is shown that an averaging property

in a distributed implementation further reduces the variations within one chip and creates

uniform neurons ideal for a large neural network implementation. Moreover, a self-

scaling property of the distributed neuron is demonstrated and verified experimentally.

Chapter 4 details the design and characterization of a hybrid analog-digital circuit

presented as a universal building block for the implementation of multilayer feedforward

neural networks. The universal building block is based on a programmable unified

synapse-neuron. It consists of a multiplying D/A converter (MDAC), an ûnalog sub-

neuron and r compact digital weight register. Design improvements are especially

presented for MDAC circuit dong with simulations and fabrication test results. The

application of the proposed building block to the implementation of two neural network

ICs is described in this chapter.

A stochastic rnodel for an Adaline with analog distributed neuron and digital synaptic

weights is presented in Chapter 5. In this case, it is shown that a useful self-scaling

property automaticdly adjusts the dynamic range of sigrnoidal function and hence

controls a stochastic gain by which quantization noise is amplified. In a conventional

lumped-neuron Madaline network, when the number of neurons per layer or the number of

neuron inputs increases (i.e. the network becomes larger), the effect of weight quantization

becomes more noticeable at the output. However, modeling and simulation in this chapter

indicate a considerable improvement in the ratio of desired signal to quantization noise.

when a programmable neural network hardware is based on a distributed- rather than a

lumped-neuron.

In Chapter 6 the design of a CMOS-compatible neural-based smart photosensor for fixai-

plane pattern classification is described. The design is based on the neural network

architecture and building blocks presented earlier in this thesis. Several conventional

neural network implementations are first examined for this pmblem. The area and

complexity of interconnections are found to be the major Iimiting factors on the size of

neural classifier and integrated photosensing amy. A neural-based smart pixel is

presented as a building block for the sensor design. Photosensitive elements are based on

Field-Effect-Modified vertical photoBJT created in a standard CMOS technology.

A programmable smart photosensor comprised of a 2-D array of 64 photosensoa and a

fullytonnected multilayer neural classifier is implemented. Compared to a conventional

implementation the proposed design has greatly reduced the intcrconnection areas and

hence increased the synaptic density as well as the dimensions of the integrated

photosensor array.

Finally, Chapter 7 presents the conclusions of this thesis. It contains a summary of the

contributions discussed in previous chapters, ranging from architectural level to novel

circuit designs to the properties shown through theoretical analyses and experimental

implementations. Suggestions are also made for future research work.

Chapter 2 A New H y b d

VLSI Architecture

2.1 Introduction

Tremendous advances in microelectronics, have made VLSI neural

networks popular within the piut ten years, as neurai networks can

provide red-time solutions to some of the real-world problems

traditionally difficult to handle by conventional digital computers

(cf: Chapter 1, Section 1.1). However, neural network VLSI

designers face many challenges. for instance, in implementing

massively interconnected networks, producing fully parallel input/

outputs and developing modular and scalable architectures

adoptable for different applications. Area and power efticiency,

speed, storage and accuracy are some other main issues.

Analog implementations of neural networks are compact, low-

power and high speed. Analog implementations however are

inaccurate and susceptible to various process variations. Analog

storage also remains a problem. On the other hand, digital

techniques offer higher precision and a multitude of flexible

architectures and design tools. However, a fully-parallel digital

neural networks implementation cornes at the cost of a complex

design with a large silicon m a and power consumption. An

attempt to multiplex computational cesources on chip reduces the

A New Hybrid VLS t Archit~cturr intmâuction 18

Uniwnity of Windsar

complexity, but significantiy downgrades the speed performance and fault tolerance of

neural chip (see Chapter 1, Section 1.2.3 and Section 1.2.5 for more details).

Compromise solutions, such as the one presented in this thesis, Iead to hybrid

implementations: mixed analog-digital circuits, and possibly optoelectronics. A robust

hybrid architecture should be able to address simultaneously as many implementation

problems as possible.

2.2 Some Implementation Problems

Some of the main issues in VLSI implementation of a neural network K, as far as this

thesis tries to address, are as follows:

AppUcation dependency: The topology and size of a neural network is highly

application dependent. The number of inputs. loyers. and neurons per layer can vary

based on input/output requirements and even the nature of leming patterns in each

application. Therefore, a highly modular architecture is required to facilitate and speed

up the process of designing a custom VLSI neural network chip.

Interconnection pmblems: In a rnultilayer neural network as shown in Figure 2.1 (a),

a fully-connected implementation between rn input nodes and n nodes (neumns) in the

next layer requires ni x n synapses and 2 x m x n physical interconnections (one input

and one output interconnect per synapse). Therefore, with similar situations in other

layers, the number of synaptic interconnections grows with o(N~), where N is a

measure of the number of nodes (inputs, and hidden and output neurons) in the neural

network. With a limited number of metal layes (e.g. 2 to 4) available for

interconnections in each technology and a minimum metal pitch for each layer

according to particular design niles (see Figure 2.l(b)), one can notice that routing

compkxity and interconnection areas roughly gnmr with quadratic order of the number

of nodes in a neural network chip. Given a limited die area. this situation eventually

creates a bottieneck and puts a restriction on the size of a neural network

implementaiion in terms of the number of nodes and layers.

A N m Hybrid VLSI khiiccnirc S o m ünplcnrncorion Robkms 19

Uniwnity of Windsor

Figure 2.1 a) Interconnections in a lullylconnected multllayer neural network, b) limitations on VLSI intercomections

Metal 1 Pitch

Various implementation ermrs: Inevitable errors are introduced when an ideal

function or quantity is implernented by physical (nonideal) hardware. In analog

circuits. inherent inaccuracies and characteristic variations create discrepancies from

ideal values. In digital implementations, quantization effects are the main source of

enor where an ideal value is realized by a finite word length or limited precision. The

effect of implementation erron becomes more important in Iarger neural networks.

Properly-scaled sigutoidal function: Multilayer feedfoward neural networks, or

Madalines, rely on neurons with a saturating hinction. the most popular of which is a

sigrnoidal' function. This function should be properly scaled over its input dynamic

range, or eise it effectively becomes either a hard-limiting or a low-gain linear function.

A hard-limiting neuron (threshold Adaline), for instance, is more sensitive to the effects

of weight quantization [68]. Improper scaling may often occur when a neuron

realization receives different numbers of inputs in different applications. Redesigning

the neuron is a possible but obviously inconvenient solution (cf Chapter 5 for details).

1. The ienns 'sigrnoidal' and 'sigrnoid-like' are used hereinafter in the sense of any rnonotonic "Sn shaped curve and do not necessarily rcfer to a specific mathematical function.

Pin limitations: Real-world applications require sizable neural networks that among

other things require a large number of input/output (VO) pins. This situation creates an

'V0 bounded' layout meaning that the required silicon die becomes much larger than

actual core (neural network) area. In addition, an excessive number of VO may exceed

the number of package pins, or at least increases interface complexity and cost. Often

inputs outnumber the outputs due to a funnel-shaped topology. In such cases, opto-

electronic input coupling offers an attractive solution to remove pin limitation problem.

Pmgmrnmability: Digital programmability of synaptic weights that define the

mapping function of a neural network is an asset that mdces an implementation more

flexible and genenl purpose.

2.3 A New Hybrid Distributed-Neuron Architecture

2.3.1 Creating a UMed Synapse-Neuron (USN)

A conventional neural network implementation consists of two types of building block:

synapse and neuron. Figure 2.2(a) shows a conventional structure of a lumped neuron

with N input synapses. A linear' synapse multiplies its input xi by a synaptic weight wi .

A neuron performs a summation (Z) on synaptic inputs, a bias or threshold adjustment

(denoted as -0 in Figure 2.2(a)), and a sigrnoidal (S-shaped) saturating functionfl.):

For better modularity, the threshold hinction cm be implemented as an additional synapse

as shown in Figure 2.2(b) with a constant input (e.g. xo = 1 ), while the threshold value

is programmed in the synaptic weight register, i.e. wo = -0. Other synaptic units in

Figure 2.2(b) are similarly considered to be digitally programmable.

-- -- -

1, Nonlinear types of synapses, e.g. quadratic and Gaussirui, have also been reported in literature [30].

A N m Hybrid VLSI Aichiîccîurc A Ncw Hybrid DUtriktrcd-Ncwon Ardiiîecturc 21

Figure 2.2 Evolutionary steps to create a unifid synapse-neuron in a hybrid architecture

Vout Vout

A New Hykid VLSf Architccnuc A New Hybrid Dis\rikiicd-Nemn Archikcturt 22

The remaining parts of Figure 2.2 show the evolutionary steps to create a new hybrid

building block for the architecture presented in this thesis, as explained next.

If the output quantity of a synapse is a cumnt. summation cm be simply implemented in

the analog domain by connecting the outputs of synapses together (Le. KCL) as shown in

Figure 2.2(c). The neuron's output and hence the synapses' input quantities are chosen to

be voltages. Therefore, in this hybrid analog-digital architecture synapses are digitally-

programmable V-to-I muhipliers. A common choice for neuron could be a transresistance

(1-to-V) amplifier as shown in Fi y r e 2.2(c).

For the reasons briefly mentioned next and clarified later, especially in Chapter 3, the

neuron function in Our architecture is realized by a nonlinear load that receives the

summation of synaptic currents and delivers a voltage on the same summation node as

shown in Figure 2.2(d). The current to voltage transfer function has a sigmoid-like shape.

The configuration in Figure 2.2(d) shows a lumped nsistive-type (1-to-V) neuron that in

practice is implemented with active MOS transistors.

A resistive-type neuron implernentation can be distributed into parallel elements, known

in this thesis as sub-neurons, such that their overall nonlinear characteristic remains the

sarne. This temporary step is depicted in Figure 2.2(e). Compared to the original lumped

neuron, each sub-neuron has a larger equivalent nonlinear resistance and a proponionally

smaller m a as it is implemented with active devices. The neural output voltage is built on

a supernode denoted as V,, in Figure 2.2(e). Although the synaptic cumnts (li 's) can be

al1 different, the currents going through sub-neumns are the sarne since they are al1 created

by the sarne voltage V,, applied to a similar nonlinear irnpedance.l The output of the

distributed neuron in this case is,

1. We negiect characteristic variations between sub-ncurons for now.

A New Hybnd MSI Archiucturc A Ncw Hybnd Disaibuitd-Nemn Arcbiteam 23

wherejJ.) is the nonlinear resistance function of a sub-neuron, as opposed (OR.) which is

the collective neural function. A more detailed analysis of a distributed neuron mode1 is

presented in Chapter 5.

Each anaiog sub-neuron in a distributed impiemenration cm be integrated with a digitally

programmable synapse as shown in Figure 2.2(0 to create a hybrid block known in this

thesis as a unifed synapse-neuron (USN). Note that for better rnodularity, the threshold

block (Vo, 1,) also includes a sub-muron which is nonetheless deactivated. A fully-

analog distributed neuron-synapse was presented in [8 11 for the purpose of implementing

a reconfigurable neural network. This thesis presents a new alternative hybrid architecture

with severd distinct featuns and contributions. Mixed-signal circuit blocks are presented

and charactenzed for this hybnd architecture with innovative designs in both synûptic and

distributed neuron realizations. New properties of a distnbuted neuron implementation

are explored in the next chapters thût were not addressed in [8 11. Moreover, an emerging

property unique to a 'hybrid' distributed implementation is found that impmves the effect

of weight quantization. This property could not be brought up in a full-analog domain. In

addition, in the context of a neural-based smart photosensor realization, the proposed

architecture provided the foundations of a robust programmable sensot design, and in

specific, reduced the synaptic interconnection areas and routing problems significmtly.

The properties of the proposed architecture are highlighted in Section 2.3.3 and will be

studied through analyses, simulations and experimentd implementations presented in the

forthcornhg chapters.

23.2 A Modular Neural Network Irnplementation

A unified synapse-neuron block pnsents a highly modular approach to a hybrid

irnplementation of multilayer neural networks. We assume a digital weight register is built

A New Hybrid VLSI Arc hirecnin A New Hybrid Disaibutcd-Ncumn Atchiltdure 24

into the synaptic multiplier sub-block. In this way, the USN is upgraded to a universal

programmable building block. The universal block, a circuit realization of which is

presented in Chapter 4. is the only block required CO implement a complete multilayer

VLSI neural network. The parallel output connection of N such building blocks, for

instance, makes a neuron with N digitally-programmable input synapses, Le. an N-input

Adaline.

Figure 2.3(a) shows three such Addines (in three columns) with four common inputs that

altogether form a single-layer 4-3 neural network built with 12 building blocks. A

muitilayer fully-connected m-n-p feedforward network' (where m is the nurnber of input

nodes, n is the number of neurons in hidden layer and p is the number of neurons in output

layer) can be implemented by interconnecting regular (m x n) and (n x p) anays of the

universal hybrid building block. Figure 2.3(b) shows a 4-34 neural network built with 18

blocks. Additional blocks of a very similar nature can be used when neuron threshold

adjustment is required. nireshold blocks receive a constant non-zero input voltage, their

threshold value is stored in the built-in weight register, and have their nonlinear sub-

neuron disactivated. Details of a circuit implementation of this network can be found in

Chapter 4, Section 4.3.1.

23.3 Properties of the Architecture

The salient features of the proposed hybrid distributed-neuron architecture are as follows:

Modularity: From the previous descriptions and examples shown in Figure 2.3, it can

be perceived that the proposed architecture is highly modular. Modularity simplifies

and speeds up the process of designing a custom integrated circuit. Multilayer neurai

networks can be conveniently built using regular arrays of a universal building block.

Examples o f neural network chips built based on this a m y architecture can be found in

Chapter 4 and Chapter 6.

1. Sometimes denoted as a m x n x p network,

A New Hybrid VLSl Archilecture A New Hybrid Dutribumi-Neumn Architecture 25

Fipre 2.3 Modular neural nehvork impkmentations: a) a 4-3 neural netwotk, b) a multilayer 4-3-2 neural network

A New HyMd VLSI Airhiî~cnur: A New Hybrid Disuibtcaî-Ntumn Archikaun 26

Silicon area efficiency: An anaefficient silicon realization is achieved for two

reasons:

(i) Reàuced interconnection airas and pmblems: Instead of global interconnections

from synapses to neurons, there is a local connection inside each USN block where

a sub-neuron is densely packed with a synapse. The remaining global connections

are made on regular buses laid on vertical or horizontal channels between the

blocks. VLSI routing problems are also reduced for the same reason.

(ii) Inter-block area saving: In any VLSI implementation where we deal with various

types of custom blocks with different ce11 sites, there are natunlly some unused

areas confined among unfit adjacent blocks. On the other hand, in an architecture

that deals with just one type of building block, there is little potential for creating

unused inter-block areas.

A quantitative case study on interconnection area reduction is given in Chapter 6.

Self-scaling sigmoidal eharacteristics: A distributed-neuron architecture exhibits an

intcresting self-scaling property. As the number of inputs to a neuron increases, e.g. in

a more demanding application, each new input synapse brings a corresponding

nonlinear sub-neuron and incrementally adjusts the overall sigmoidal neuron

characteristic. This anolog phenornenon prevents an improper neuron scaling (e.g. a

hard-limiting or a low-gain linear characteristic) that could disturb effective

functionality. It thus circumvents the need for a redesign. Circuit simulation and

experimental verification of this property is given in Chapter 3 and a theontical

modeling cm be found in Chapter 5.

Quantization noise improvement: Quantization noise improvement is a direct

consequence of self-scaling property; however, it only emerges in the context of a

hybrid architecture that relies on quantized weight values. Analyses and simulations

supponing this theory on the basis of a new stochastic modei for distributed-neuron

Adaline are presented in Chapter 5.

University of Wtndsor

Averaging property in neurons: Sub-neuron elements on a physically distributed

array take an average of various pmcess parameters, e.g. threshold voltage gradient

over a silicon die. This property significantly reduces analog process variations among

sigmoidai neurons and makes their characteristics virtually unifom. A discussion of

the subject dong with fabrication measurement results is presented in Chapter 3.

r I n c d fault tolerance: Neurai networks posses an inherent fault tolerance mainly

due to the redundancy in their synaptic connections. In the proposed architecture, there

is an additional element of redundancy in 'neurons'. As each neuron implementation is

distributed among N sub-blocks a fault. e.g. a VLSI defect, would affect only 1 /N th of

a neuron while the other parts remain intact. A discussion is provided in Chapter 3.

a Automatic fan-out Lncrease: A neuron circuit has to be able to drive ail of its outgoing

synapses. Total load may include a considerable number of interconnection Iines and

input impedances in the next layer. High-drive buffers have been presented for large

networks [97]. In the proposed architecture. a USN is a "more-fan-in more-fan-out"

entity: a configuration with a higher number of inputs results in a higher number of

output transistors in parallei, that immediately translates into a lower output impedance

and a higher drive. Therefore, in a sizable network a large neuron supernode is

potentially capable of driving rnany blocks in the next layer. if required. Note that this

property is related to output impedance and driving capability. and should not be

mistaken with self-scaling property of neuron's saturating function.

hgrammability: Digital prograrnmability of the universal hybrid block in a Read

Write weight ngister allows the mapping function of the neural network to be redefined

for different applications. This Rexibility is used in the design of a general-purpose

neural network classifier presented in Chapter 4 and a programmable smart photosensor

explained in Chapter 6. A traininglrecall simulator is especidly developed for USN

architecture. Following an off-line training session and a simulated recall, digital

weights are progmmed in a chip using a host cornputer or a software-controlled

digital tester (cf: Chapter 6).

Optoelectmnic integration: Although the architecture described in this chapter does

not include an optoelectronic component. the regular nature of its silicon

implementation lends itself to a possible photosensor array integration. Such sensor

fusion technique on input nodes, removes pin limitation problems and allows r fully

parallel input operation. Details of an optoclectronic architecture especially developed

for a neural-based smart photosensor is left for discussion in Chapter 6.

2.4 Conclusion

In this chapter, a new hybrid distributed-neuron architecture was presented for parallel

fully-connected implementation of multilayer neural networks. Salient features of this

architecture are as follows:

Modularity based on a universal building block

Fully-parallel singleîhip implementation

Silicon area efficiency and reduced interconnection problems

Self-scaling sigmoidal characteristics of neurons

Weight quantization noise improvement

Averaging property resulting in nearly-unifom neuron functions

Increased fault tolermce

Automatic fan-out increase for USN blocks

Digital programmability

A brief intuitive discussion of each property was included in this chapter while references

were made to the forthcoming chapters for detaiailed analyses. simulations or experimental

results. A special optoelectronic neural neiwork architecture based on the proposed USN

approach and the notion of a neural-based sniart pixel will be presented in Chapter 6 dong

with the required background*

A New Hybrid MSl Architecture Conclusion 29

Chapter 3 Distributed Neuron and its Properîies

3.1 Introduction

Analog VLSI circuits provide compact. high-speed and power-

efficient nalizations of artificial neural networks. Analog

implementations are, however. inaccurate and prone to process

variations and mismatch. CMOS circuits, for instance, are

especially sensitive to variations in threshold voltage (VJ of

transistors. Therefore. considering the complexity of real world

neural networks, an analog circuit which is both simple and

accurate would be an attractive choice for VLSI neural networks.

In this chapter, a nonlinear nsistive-type neuron is presented that

implements a saturating function by combining nonlinear

characteristics of MOS transistors. Characteristic variations are

found to be inherently small in analysis, simulations and

measurements. Variations are measured both across a chip and

among several fabricated chips. Onchip variations are reduced

hirther by using a distributed-neuron implementation which utilizes

a parallel configuration of identical compact sub-neurons. Another

property narnely the self-scaling of disîributed-neuron circuit is

demonstrated through simulations and experimental measurements.

3.2 Nonlhear Resistive-type Neuron

Amplifier-type neuron circuits use MOS transistors in exponential (sub-threshold) or

square-law (above-threshold saturation) region of operation. In simple non-feedback

configurations, these two-port circuits generally mngnify the effect of device parameter

variations. e.g. V, , dong with the desired (neural) signal. Generaily, the idea behind an

amplifier is to 'linearize' the characteristics of the active device(s) using various circuit

techniques. In an amplifier-type neuron, one makes a Further attempt to make the

linearized characteristics 'nonlinear' again!

On the other hand, a nonlinear resistive-type neuron such as the one presented in this

chapter, relies on basic nonlinearity in V-1 characteristics of MOS transistors to

approximate a sigrnoid-like saturating function. it thus avoids unwanted enhancement of

parameter variations through exponential or square law. Moreover, the resulting circuit

has a lower complexity compared to an amplifier counterpart. Finally, a resistive-type

(1-to-V) neuron receives the total sum of synaptic currents and converts it to a voltage on

the same node. It thus provides a one-port implementation which is compact by nature.

3.2.1 Circuit Description

A resistive-type neuron based on a nonlinear load is reported in [8 11 and shown in Figure

3.1 (a). The saturating 1-to-V function of this neuron circuit relies on the characteristics of

two transistors MbM2 and a linear transition region corresponding to û resistor R.

The resistor may be implemented by additional MOS transistor(s) or resistive layers, or

may n l y on parasiticlleakage impedances in the system [48].

A modified circuit is presented here based on four transistors (and no resistor) that

approximates a S-shaped neural function by combining quadratic characteristics of the

MOS transistors. A circuit diagrarn is shown in Figure 3.1 (b). In fact, the two addi tional

devices M3-M4 are replacing R with a lightly S-shaped characteristic in the region where

Ml and M2 are both OFF. However, M3 and M4 do not implement a simple resistor as

University o f Windsor

they are not operating in their triode region. Note that a real sigmoid function does not

constitute a region with constant derivative (slope) even though it might be approximated

that way. Therefore, the presented circuit provides a more realistic approximation to a

sigrnoidal function. one which is based on four nonlinear hinction segments, rather than

two nonlinear and one linear segment.

Figure 3.2 shows the simulated characteristics of the two circuits in Figure 3.l(a) and (b)

based on the device sizes tabulated in Table 3.1. Both circuits are designed to reach O or

5V at extreme synaptic currents of 11OûpA. With a large value chosen for R. the original

circuit of Figure 3.l(a) shows a stepwise transition, and in any case it has a constant-

derivative (linear) region ended by abrupt changes in the derivative at the two points where

the line intersects the nonlinear regions. On the other hand, the modified circuit of Figure

3.1 (b) creates a smoother transition in the function and its derivative, a behavior closer to

that of a sigmoid function. A differentiable sigmoid-like neuron function is especially

desirable for in-loop training using popular gradient-based algorithms.

The modi fied neuron circuit of Figure 3.1 (b) has a compact cell Iayout (36.4pmx 19.4pm)

in a 1 . 2 ~ m CMOS process (cf: Figure A. 1). There is no noticeable layout overhead for this

design cornparrd to the original circuit because, ( i ) each transistor now has a smaller

widih (see Table 3.1) so that the combination of two NMOS (or PMOS) transistors

handles the same amount of current; (ii) resistor R has been nmoved. The shapes of the

nonlinear function segments cm be varied by adjusting the aspect ratio (WL) of the

transistors and the bias voltages VBl and VB2. An analytic study is presented next.

31.2 Analysis of 1-V Characteristics

The saturating function of the neuron circuit in Figure 3.1 (b) is an outcome of nonlinear

1-V characteristics of four MOS transistors in their saturation region. In general, four dc

voltages could be used to bias the gates of the four MOS transistors, resulting in five

ngions of operation between O and Vm. With a carefbl design however we are able to

simpliv the biasing scheme to use only two common bias voltages, narnelp VBI and Vm.

Figure 3.1 Circuit d i a p of nonlinear 1-to-V murons: a) original circuit [BI], b) the modüied circuit

Table 3.1. Device sizes for the two neuron circuits shown in Figure 3.1 and simuiated in Figure 3.2

Ml: 10.01 1.6 Ml: 4.W 1.6

M3: 2.0/ 2.0

Figure 3 3 1-V characteristics of the circuits in Figure 3.1(a) and (b)

Moreover, we c m eliminate an intermediate region in which al1 transistors would be off.

The regions of operation and their boundaries are marked on a simulated 1-V curve in

Figure 3.3(a). The disîribution of input current among the four transistors is shown in

Figure 3.3(b). Simulations are performed using level 3 Hspice models for the target

1 . 2 ~ CMOS process.

Figure 3.3 a) Regions of operation on S-shaped V-1 characteristic, b) current distribution in four MOS transistors

To choose proper bias voltages, the following design criteria is applied in order to merge

two of the intemediate boundaries in Figure 3.3(a) elirninating a region in which ail four

transistors would be OFF:

On the other hand, to have a center line around V D D / 2 , we choose:

Table 3.2 specifies the four regions of operation for the neuron circuit. For positive input

currents, both NMOS transistors are OFF and one or both of the PMOS transistors

conduct. The amount of voltage built up on V,,, node determines if one or both of the

PMOS transistors should conduct. On the other hand. for negative input currents both

PMOS transistors are OFF and one or both of the NMOS transistors conduct.

The slope of 1-V curve at lin = O would be limited in practice by the equivafent output

impedance of synaptic current sources connected in paralle1 to the neuron input. The

dope of the curve at the two saturation end does not reach zero; an important

consideration for many learning aigorithms.

Table 3.2. Regions of opedon for the neuron circuit shown in Figure 3.l(b)

. -- - - - -

W b u t c d Neumn and iu Pmp#ticr Noaiim~r RcsirtM-iyp Neuroa 35

It can be further shown for this circuit that when a transistor conducts, it operates in the

saturation (non-triode) region. The saturation condition for an NMOS is defined as:

For instance, in Region 1 (O < V,,, < VBl - V,,). for the conducting transistor M4 we have

( VDS)* = VDD - Vau, and ( = VBZ - Vau,. Therefore, the saturation condition

defined in Eqn. (3.4) always holds:

Assuming an ideal square law in sahwation region, the output voltage can be obtained as a

function of input current in each region by writing KCL relation on output node and

solving the corresponding quadratic equation. For example, in Region 1:

where p, is electron rnobility and Co, is MOS oxide capacitance per unit area (~/d).

Eqn. (3.6), after some manipulations, results in the following quadratic equation:

in which Km= p0COX. One of the two mots of the above quadratic equation is the

physical solution for V,,, expressed as a hinction of Iin and circuit parameters:

Distnbutcd Ncumn Piad its Ropciritz Nonlincar Raijtivc-type Ncuroa 36

- Region 1: Vouf - '81 + ' ~ 2 2

(3.8) - V r n - K ; ~ * w ;

w2 w4 in which we have assumed - = -, L2 4

Similarly, the results for the other three regions are found and summarized below:

Region II:

Region III:

Wl w3 assuming - = -. LI =3

The assumptions made for regions 1 and IV only lead to simpler formulations for

V,,, = f (1,) and are not circuit design criteria. In sumrnary, the neural saturating

function can be expressed as a four-piece nonlinear function exhibiting in al1 regions an

inverse quadratic relation that can be defined in a general fom as follows:

where Vos, C and los are positive coefficients that are different for different regions.

Vos and los are offset voltage and offset current coefficients respectively, while C is a

coefficient in ohms (Q) that determines the shape of each nonlinear function segment.

Vos, in particular, is a function of bias and thnshold voltages in al1 regions.

3.2.3 A Sensitivity Study

Parameters of interest in a sensitivity study of MOS transistors are threshold voltages

( V, , V,, ) and mobility factors ( p, . p, ; or equivalently gain factors K, , Kp ). The

presented circuit is less sensitive to mobility or gain factors than to threshold voltages, as

in al1 four regions K, or K p appear under root sign. Assurning IV,,I = V t p = V, and

neglecting body effect, we calculate the sensitivity of the output voltage with respect to

IAVtI threshold voltage. With a relative change - in threshold voltage due to process v,

variations, the relative change in output voltage wxt. Ml-scaie output (V& is:

av*,t From Eqn. (3.8) and Eqn. (3.9) we have - = -1. Also from Eqn. (3.10) and Eqn. a vm

avou t (3.1 1) we have - = - 1 (note: V,, c O ). Therefore, for al1 regions - = I . a v,

Assuming V, = 0.75 V, a full-scale range of (VOUt)J = 5 V and a worsttase process

variation of 2 x 100= 20 96, from Eqn. (3.1 3) we can find, KI

Despite the simplicity of the circuit and a considerable variation in circuit parameter V' the output voltage variation is quite low based on the above analysis. Circuit simulations,

with a threshold voltage variation applied through Hspice mode1 parameter DELVTO [ S I , confirmed the above analysis. A typical simulation plot is shown in Figure 3.4 from

which a low variation on the output characteristics can be observed.

Diiraiùutcd Newon Md its Ropiics Nonüneor Rsi s t iv t - typ Ncuron 38

For f 10% variations applied to the threshold voltage parameter, maximum variations

observed in the output voltage w.r.t full scale were about f 1.55%. Maximum variations

occumd at V,,,= VDD and V,,,= O .

Figure 3.4

Simulations with 110% variations on threshold voltage

3.2.4 Fabrications and Measurements

The neuron circuit of Figure 3.l(b) was implemented in a standard 1.2pm CMOS

technology. The ce11 had a compact layout of 36.4pm x 19.4pm. Bias voltages VBl and

Vm were generated on-chip using an NMOS voltage divider. Ten chips were fabricated

and tested.' Figure 3.5 shows a microphotograph of the fabricated test chip that includes

lumped neuron circuits, unified synapse-neurons and several other neural network test

structures. Transfer characteristics were measured using a Mixed-signal Test Head

(TH-1000 fmm C M 0 controlled by a test program developed in HP VEE' and run on a

HP 700i series workstation.

1. Fabrication has k e n done through the Canadian Microelcctronics Corpontion (CMC) under the design narne MHNT,

2. HP V I E Visual Engineering Environment fiom HewIett-Packasci.

Disaibutcd Ncuron ond its Roprtics Non i i n a ~ Ilcristive-typ Ncumir 39

Flpre 3.5 Microphotograph of a fabricated CM06 chip that includes lumpcd neumns, uniaed synapse-neurons and other test circuits

Figure 3.6 Measured neuron characteristics from 10 fabricated chips: a) ovedaid nsults, b) a close-up view of maximum cbip-to-chip variations

The overlaid results of measurements from 10 chips are shown in Figure 3.6(a).

Experimental characteristics were in close agreement with the simulations shown earlier

(e.g. in Figure 3.3, or Figure 3.4). Moreover, the maximum measured chip-to-chip

variation was 1 IOmV, or 2.2% in 5-V range as shown in a close-up view in Figure 3.6(b).

The measured value translates back to a maximum threshold voltage variation of about

15% based on Eqn. (3.13), which is smaller than woat case assumed earlier in the

simulations. The accuracy of the measurements was ilOmV.

The dispersion of the curves in Figure 3.10(a) mon resembles an offset rather than a gain

error. This reflects the dominance of threshold-type variations as described by Vos in Eqn.

(3.12) compared to other parameter variations. Maximum chip-to-chip variation of 2.2%

occurred around V,,, = VDo. Le. in region IV. On the 0 t h hand. maximum measured

variations around V,,, = O (in ngion I) was only 60mV, or 1.3% in 5-V range. The

explanation hen is that in region IV where the highest variation was observed, PMOS

transistors Ml-M3 conduct while in region 1 two NMOS transistors M2-M4 conduct.

Process variations on PMOS transistors are seen to be larger. The reason is that PMOS

transistors are cnated in an N-well that involves extra msks and processing steps

cornpared to NMOS transistors built directly on the P-substrate.

Besides the experimental study of chip-to-chip variations, characteristics were measured

'within' each fabricated chip to determine the amount of 'onship' variations. This study

has a greater importance as it indicates the arnount of discrepancy among neurons

operating in one network. Neuron cells w e n laid out at various locations, including

corner positions, on test chips.' Measurements were performed on different cells within a

chip and repeated for different fabricated chips. The worst-case characteristic variations

within one chip was 67mV in 5-V range, or 1.3%, measured between two corner cells.

This number is smaller than maximum 'chip-to-chip' variation, as reasonably expected.

Maximum onchip variations of 1.3% occumd around V,,, = VDD where PMOS

1. Some cells were located on the corners of a second generation fabricated chip, WRNBS. The core design in WRNE3S is an optical neural network describcd in Chapter 4.

transistors conduct. On the other hand, around V,,, = O where NMOS transistors

conduct, the maximum measured variation was only 35mV, or 0.7% in 5-V range. Table

3.3 summarizes the experimental results presented in this section. The results, in generai,

suggest a low variation for the presented neuron circuit.

Table 3.3. A sumrnary of experimentsl results on (lumpai) neumn circuit

1 (PMOS region) ]

1 (NMOS region) 1 Maximum I Uwiniions I

In summary, circuit analysis, simulations and fabrication measurements presented in this

section al1 suggested low characteristic variations for the proposed neuron circuit despite

its simple topology and compact layout.

A majority of neumn implementations presented in the literature do not report on

measured characteristic variations and when they do, often create concem about the

I accuracy of some analog implementations, especially amplifier-type

operational transconductance amplifier (OTA) is presented in [7 11

sigrnoid-like neurons. The output quantity (cumnt) is proportional

neurons [66]. An

for implementing

to ~f and K (or

equivalently p ), which is a typicd of a MOS amplifier. Therefore, due to larger exponents

the sensitivity to both major mismatch modeling factors (V, and K) is theoretically larger

than that o f the resistive-type neuron presented in this chapter. Measurrd variations were

not reported in [71].

3.3 Implementation and Properties of a Distributed Neuron

A useful property of a resistive-type neuron is that it cm be implemented as paralle1

combination of similar elements known in this work as sub-neumns. Each sub-neuron has

a larger nonlinear resistance such that the parallel combination of N sub-neurons creates

the characteristics required for an N-input neuron. When implemented with active

devices, each sub-neuron has a smaller transistor width to implement a larger resistance.

In general, a MOS transistor with a width of N. W in a lumped neuron implernentation is

replaced by transistors of width Win each of N constituting sub-neurons.

Figure 3.7 shows the transistor-level diagram of a distributed neuron implemented based

on the circuit presented in Section 3.2. In this diagram I l , f2, ..., IN are anaiog currents

received from input synapses. An analog summation of synaptic cumnts is autornatically

performed on a supernode. Total synaptic current (IJUm= Il + I2 + ... + IN) then divides

equally among sub-neurons as they are similar resistive blocks connected to the same

voltage.' As a result, each sub-neuron nceives an average of input synaptic currents la,

and performs a saturating 1-to-V function &(.) by combining nonlinear characteristics of

four MOS transistors as described in Section 3.2.

In practice, each sub-neuron is densely integrated with a comsponding synapse creating a

unified synapse-neuron. However, sub-neurons can be physically far apart on silicon die.

They are only connected via an analog bus on which cumnt summation is perfonned and

V,, is created.

1, Hert, we neglect neuron threshold (bias) cumnt, i.e. assume Io = O. We also assume in this circuit derivation that sub-neurons have similar characteristics, i.e. neglect characteristic variations.

Figure 3.7 Implemenîation of a distributed neuron

33.1 An Averaging Effect

Analog circuits, for instance lumped sigrnoidai neurons, implemented at different

locations across a sizable chip are subject to noticeable variations in their expected

characteristics. This is due to process variations, especially the gradient of doping on die

surface that respectively causes a gradient on MOS transistor parameters such as threshold

voltage. In Section 3.2 it has k e n shown that the presented resistive-type neuron circuit

inherently has a low sensitivity to process-dependent variations. In this section it is shown

that a distributed neuron implementation further reduces the existing variations on a chip,

thus cnating very similar neumn characteristics.

krsurning a two-dimensional gradient, the difference between the threshold voltages of

two transistors located at relative distance (hx, Ay ) on a die is:

In Figure 3.8(a) lumped neurons are located at maximum horizontal distance of D on a

die subject to V, gradient. In mosi practical cases we can assume a constant gradient in

avt av, each direction. i.e. - = k, and - = ky. Hence, the worst-case threshold ax ay

discrepancy among the lumped cells in Figure 3.8(a) is:

Figure 3.8 Neuron cells in a gradient of doping: a) lumpod reslization, b) distributed renlization

+ & Out1

Univtnity of Windsor

Now, we consider a distributed implementation such as the one shown in Figure 3.8(b).

In this example, each ce11 is implemented with five sub-neurons indicating a 5-input

neuron. Implementation is confined to the same area as of the lumped neurons. In this

case, the maximum distance between the centroid of distributed neurons is effectively

reduced to d, where d D. Therefore, the worst-case threshold discrepancy among the

three distributed cells shown in Figure 3.8(b) is:

The threshold voltage variation. in an ideal case, is reduced by a factor of ( d / D ) « 1

which depends on layout. The improvement is more significant for a network with a large

number of neurons and neuron inputs. Monover, this property can be generalized and

utiiized more effectively on a two-dimensional m a y such as the one in a neural-based

photosensor described in Chapter 6.

In a sirnilar manner, it can be argued that in distributed neurons. variations on other

process-dependent parameters such as p, or CI, would also be averaged out, thus

resulting in uniform characteristics for al1 neurons in a layer of a K S I neural network.

Extra interconnects in Figure 3.8(b) do not create an overhead as such interconnections in

fact do exist in a lumped implementation at the output of synapse cells to form the

summation of synaptic currents. In a distnbuted-neuron implementation each synaptic

ce11 incorporates a densely packed nonlinear sub-neuron, and the outputs of unified

synapse-neuron cells are interconnected in a very similar manner.

The averaging property of distributed neurons was experimentally verified through

fabrication and testing. To demonstrate a worst-case scenario, five-input neurons with the

same circuits as explained in Section 3.2 were first laid out as lumped cells at various

distances on a test chip. Measurements wen perfonned on different cells and repeated

over five fabricated chips. The worst case on-chip variations of the characteristics was

found between two cells at the greatcst distance. The variation was 65mV in EVolt

range,' i.e. an analog accuracy of 1.3% approximately equivalent to 6 bits resolution. The

distance between the two cells was 2500p. fn Figure 3.9(a) a typical measured

characteristic is shown on the left and a close-up of the worst case curves around SV is

shown on the right.

The advantage of a tnily-distributed neuron cm be observed when the building elements

are distributed in one or two dimensions across the chip. in this manner. an average of

various characteristics is obtained which corresponds to average process parameters.

The characteristic variations between two averaged neurons built in this manner is reduced

to the small variability of neighboring sub-neurons. In the case of our test chips, 5-input

'distributed' neurons were implemented with five sub-neurons laid on a linear array.

Maximum rneiisured characteristic variation was only 25mV. or 0.5% in SV (equivalent to

1 of 7 bits), as opposed to 1.3% ( 1 of 6 bits) for the case of 'lumped' cells. The remaining

discrepancy is mainly related to non-gradient type variations as well as some tolerances on

transistor sizes*

Table 3.4 summuizes the experimental results comparing lumped and distributed neuron

circuits.

Table 3.4. A summary of comparative measurements on lumpeà and àistributed neurohc

Maxim& On-chip Volts in SV 6SmV 25mV Percent 1.3% 0.5%

1. This measurcment result is close to the one ptesented in Section 32.4 under slightly different conditions.

üistribated Neufon and its Ropcrtits Implemmiaiion and Prqmiks of a ûistriited Neumn 47

Figure 3.9 On-chip variations of characteristlcs (worst case among 5 Bbrications):

a) lumped neuron Mpkmuitation,

b) distributcd neumn implementation

In this section an interesting property of n distributed neuron, narnely, self-scaling is

introduced through circuit simulations and experimental verification. A brief study is

followed for an intuitive understanding of this property. A more generalized study,

independent of particular circuit implementations. will be presented in Chapter 5.

Different neural network applications require different numbea of neurons and neuron

inputs. When the number of inputs to a lurnped neuron circuit increases, e.g. in a

programmable network. large saturation areas are created resulting in a hard-limiting,

rather than a sigrnoidal, behavior. On the other hand, a dramatic decrease in the number of

inputs to a lumped neuron. effectively results in a low-gain linear neuron function.

Therefore, each neuron should be (re)designed based on the number of its inputs such that

the saturating function is properly scaled over the dynamic range of neuron inputs.

A distributed neuron implementation presents a 'self-scaling' property in this regard.

When the number of synaptic inputs to a neuron (Le. the number of neurons in the

previous layer) increases/decreases, the ovenll nonlinear characteristic is scaled by itself.

The reason is each synaptic input brings a corresponding sub-neuron that incrementally

ûdjusts the overall nonlinear function of the distributed neuron. By properly stretching the

dynamic range of the saturating function, this property restores information received from

new inputs that othenvise would have ken lost in large saturation mas of a Axed lumped

neuron.

Refemng to Figure 3.7, in a distributed neuron with N sub-neuron blocks as current

divides equaily among N similar blocks (each block receiving an average current I,,) the

output voltage can be calcuiated in two alternative ways. If we consider the overall

nonlineûr function ff .) we have.

Dùtnbuccd Ncuroa Md iu Roptks implciacnwion and Roperties of a Dutiikiced N e m 49

On the other hand, regarding each individual sub-neuron block with nonlinear hinction

A(. 1, from Eqn. (3.15) and Eqn. (3.16) we have:

Since the two calculated voltages must be the same, we conclude:

Eqn. (3.22) mathematically defines the function ff.) as a scaled version of the original sub-

neuron functionL(.).

Simulations and experimental measurements were carried out based on a sub-neuron

circuit block similar to the one in Figure 3.2. Figure 3.10(a) shows the simulation results

that confirm Eqn. (3.22) by comparing the characteristics of one sub-neuron block ( N = l )

and a five-input distributed neuron (N=5). Figure 3.10(b) shows the measured

characteristics of a two-input and a five-input distributed neuron fabricated in 1 . 2 ~

CMOS. The self-scaling property can be verified from these measurement diagrams.

The cursor points on Figure 3.10(b) show that a similar output voltage of V,,, = 4.5 V

was obtained with net input currents of 1, = lOOpA for the 2-input neuron and

1, = 250pA for the 5-input neuron. The ratio of the two currents is 2 : 5, Le. the ratio of

the number of inputs to the neurons. The sarne scaling ratio is verified for other

measurement points.

The self-scaling property will be studied in Chapter 5 on a broader view to establish a

stochastic mode1 for distributed neurons. It will be shown that the self-scaling property of

distributed neurons improves the ratio of signai to quantization noise at the outputs of a

programmable hybrid neural nehvork with analog neurons and digitized weights. The

results will confina the intuition obtained in this chapter.

Figure 3.10 SeEscaling pmperty of the distributed n e m n circuit:

a) simulated characteristics of a sub-neuron and a 5-input neuron,

b) experimentai results comparing a 2-input and a 5-input neumn

33.3 An Increased Fault Tolerance

Neural networks are traditionally known for their interconnection redundancy and thus

fault tokrance. In a neural network with distributed neurons there is a potential increase

in robusmess and fault-tolerance. As each neuron is distributed among N sub-blocks, a

1 VLSI defect would affect only -th of a neuron instead of the whole. An open-circuit N

University of Windsar

fault, for example a broken line, disables one sub-muron from a parailel combination

leaving the remaining (N-1) sub-neurons intact. The resultant neuron would still be

N- 1 operative with a saturating function ealed by - N

. For a moderate or large value of N,

the characteristic would be close to the original. On the other hand, the functionality of a

lumped neuron would be totally destroyed by an open circuit fault.

In case of a shon circuit, both diseributed and lumped neurons would be disrupted

similarly. Moreover, the probability of a VLSI short circuit is essentially the same in both

cases. This probability depends on the number of layout contacts and active area of cells.

Since a lumped transistor with size N. W is broken into N transistors with size W for a

distributed implementation, the nurnber of contacts and the active area remain the sarne.

3.4 Conclusion

In this chapter, a simple 1-to-V neuron circuit was presented that relies on inherent

nonlinearity in 1-V characteristics of NMOS and PMOS transistors to approximate a

saturating hinction. The saturating function was analytically studied and a sensitivity

analysis was carried out. Circuit analysis and simulations both suggested an interestingly

low 3% characteristic variation despite 20% variations in threshold voltage parameter.

Experimental measurements proved to be even more promising. Maximum chip-tochip

variations from 10 fabrications was 2.2%, white worst-case variation between neurons

within one chip was 1.3%. Moreover, a distributed-neuron implementation further

reduced the variations within one chip, creating vimially unifom neurons with measured

variations at or below 0.5%.

Other properties of a distributed neuron circuit namely a self-scaling characteristic and

improved fault tolerance were described. In particular, the self-scaling property was

exploced through circuit simulations and fabrication measuremcnts. A generalized

discussion on this subject is presented in Chapter 5.

Distributed Nturoa nad itt Ptopcnitr Conclusion 52

Chapter 4 A Universal Hybrid

Block for NNZCs

4.1 Introduction

This chapter describes the design and characterization of a hybrid

analog-digital1 circuit presented as a universal building block for

the implementation of multilayer feedforward neural networks.

The universal block is based on a mixed-signal multiplier and a

distributed neuron design, the latter one described earlier in

Chapter 3. A special property emerging from a hybrid distributed-

neuron realization will be snidied in depth in Chapter 5.

Circuit simulations, fabrication test results and design

improvements especially on an MDAC-type synapse are presented

fi rst. The application of the proposed building block in the design

of a few Neural Network Integrated Circuits (NNICs) will be

described aftenvards. One of these NNICs, namely r neural-based

smart photosensor. will be discussed in more details in Chapter 6.

1. nie tenns 'Viybrid ", "ltybrr'd anulog-digital " and "mked-sigd " are used interchangeably.

A Univemû Hybrid Block for NNKs Introductim 53

4.2 A Programmable Universal Hybrid Building Block

The architecture presented in this thesis implements neural networks with regular arrays of

a programmable universal hybrid building block. This block, in essence, is a

"nonlinearly-loaded mixed-signal multiplier" consisting of the following sub-blocks

shown in Figure 4.1: a) a Multiplying Digital-to-Analog Converter (MDAC) synapse;

b) a digitally-programmable weight register; c) a nonlinear load or sub-neuron. In each

universal block, a synaptic weight can be stored digitaily in a 5-bit ReaWrite (R/W)

register. Synaptic multiplication is performed by an MDAC circuit. A nonlinear resistive

sub-neuron loading the output of each MDAC, converts the synaptic output current to a

voltage. Several distributed sub-neurons from different blocks collectivel y perform the

function of a sigrnoidal neuron (cf Chapter 3, Section 3.3).

Figure 4.1 Sub-blocks of the universal hybrid building block

u Weight

Nonlinear Su b-neuron Reglster

4a2.1 Multiplying D-to-A Converter (MDAC)

MDAC is a mixed-signal block that produces an output current proportionai to the

multiplication of an analog input voltage by a signed digital synaptic weight:

A U n i w d Hybcid Block f i NNICs A hgmmmaûlc U n i d Hybnd Building Blodr 54

Analog voltage y,, is received from a neuron in the previous layer, and output current I,,,

represents the synaptic activity of MDAC-type synapse. MDAC consists of three sub-

circuits as shown in a conceptual block diagrarn in Figure 4.2(a). These sub-circuits are:

1) a voltage-to-current converter (V-to-I); 2) a set of binary-weighted programmable

current miron; 3) a sign-bit circuit. The output current of MDAC, refemng to Figure

4.2(a), can be expressed as:

0 4 is the sign bit of the digital weight thn sets the direction of output current: D4=0

creates a positive (ercitatory) synaptic current while D4=1 sets a negative (inhibitory)

current. The synaptic current magnitude is determined by binary-coded weight bits 0 3 to

DO multiplied by V,. Coefficient K in Eqn. (4.2) is a constant in Ohms (R) mainly

determined by the V-to-I converter. Figure 4.2(b) shows the circuit diagram of the initial

MDAC; a five-bit version of 1571. Ail three sub-circuits are modified as will be explained

next and shown in evolutionary steps in Figure 4.2 parts (b) to (e).

In Figure 4.2(c), the sign-bit circuit is greatly simplified such that it only requires 0 4

input. insteûd of both 0 4 and 04. saving an inverter or an extra interconnection line per

synapse. Each saving related to synaptic circuits is important due to the great number of

synaptic blocks and interconnects that cm eventually occupy a considerable die area. The

modified sign-bit circuit consists of only four transistors that produce a bi-directional

output current. When 04 is High (i.e. a negative weight) the three PMOS transistors in the

sign-bit circuit are OFF and NMOS is ON that sinks the total binary-weighted cumnt

from the output terminal. When 04 is Low. NMOS is OFF and the thne PMOS devices

are ON acting as a current mirror that will source the binary-weighted current I,,,, to the

output. Table 4.1 summarizes the device widths and lengths (R( L) in MDAC circuit of

Figure 4.2(c) implemented in 1.2pm CMOS. A layout technique known as "AW

correction" is used in the binary-weighted curnnt minor for a better device matching [l].

A Univemû Hybrid BIack far NMCI A ROgmmmôk U n i d Hybrid Building Block 55

Figure 4.2 MDAC-type synapse, evolutionary steps:

b) a 5-bit version based on [57],

C) modification in sign-bit circuit,

d) modiflcatioa in current mirmrs,

e) modiscation in V-to-1 converter

A Univtd Hybrid BI& for NNlCs A Ibgmmmble Univasai Hybrid Buüding Block

Vin 1 1 I 81 t 4 1 t 2 1

V - to 1 -

Binary-weighted Programmable Current Mirrors

- 0nd D3 02 01 DO

Figure 4.2 Continued

A u n i d ~ y b t i d ~ t o d ~ OC NNICS A ~ogrammab~t UM ~ybnd ~uiiding BI& n

- - --

Figure 4.3(a) shows the output current of MDAC with modified sign-bit (the circuit in

Figure 4.2(c)) at maximum input (V, =5V) as binary weight increases successiveiy from

-15 to 15. Figure 4.3(b) shows fabrication measurement results that are in close

agreement with simulations. MDAC operates, within a 3% linearity margin, as a weight-

dependent current source with a nominal output cumnt of - LOOpA to +lûû@.

Figure 4 3 MDAC output current (at bn--) vs. binary weights:

a) simulation waveforms, b) fabrication measurements

A U n i v d Hybrid Blodt for NNlCs A Rogpunmnbk Univecd Hybrid Building Block 58

Table 4.1. Dcvice sizes (WL) in pn for MDAC circuit shown in Figure 4.2(c)

1 Ml-Mla: 3.2 n.0 1 M2: 2.4 f24. 1

The next modification is shown in Figure 4.2(d) in which a row of transistors are removed

from the binary-weighted current mirror and V-to-1 circuit. This modification: a) makes

headroom for a lower supply voltage (3 .W instead of SV) as we avoid stacking up

transistors; b) reduces the input dead zone of each synapse from 2V, to V, (see Figure

4.4(a)), where V, is the threshold voltage of a diode-connected NMOS transistor in V-to-1

circuit. The modification is crucial. especially in the design of a low-voltage low-power

cell. The penalty is a nduction in output impedance of the cumnt mimors. This is

compensated, to some extent, by transistor resizing (e.g. by using current minor

transistors with longer channels), and the nst only drops the output current of MDAC

slightly under nonideal load conditions. The modification is well justified at system level

where we enjoy low-voltage low-power cells with inc~ased dynarnic range.

Figure 4.4(a) shows the transfer characteristics of Io,, vs. V, (at maximum weight) for the

circuits shown in Figure 4.2(c) and Figure 4.2(d). The dynamic range of operation has

k e n appmntly increased in the latter circuit compand to the former one due to the

removal of one threshold voltage element in wto-l circuit.

In its linear region, the characteristic of Io,,, vs. V, relies on a long-channel NMOS

transistor (M2) of V-to-1 converter in triode region. On the upper end of the characteristic

when input voltage becomes too large. M2 enters the saturation region and output current

goes nonlinear or eventually saturates. This is evident for IOut2 in Figure 4.4(b). To

linearize the characteristics at high end. a long-chanml PMOS is added to V-to-1 circuit in

parallel with the NMOS transistor already in place. An alternative solution, seemingly

A Univemai Hybrid Block for NNICs A RogrYnmoblc Univemi Hybnd Building Block 59

impractical, would be to tie up the gate of M2 to a dc voltage VGG higher than VDD. The

tinal MDAC circuit after the modification of the V-to-1 converter is shown in Figure

4.2(e). The irnproved transfer characteristics are shown for Iouul in Figure 4.4(b)

compared to

Figure 4.4 Improving the dynamic range of I,, vs. V, in MDAC:

a) threshold duct ion (circuit improvement from Fipre 4.2(c) to Figure 4.2(d))

b) linearization (circuit improvement from Figure 4.2(d) to Figure 4.2(e))

A U n i d Hybrid Block for NNICs A PmgmmW U n i d Hyùrid BWlhiag Block 60

4.2.2 Weight Register with Double-Phase Clock

The synaptic weight is stored in digital form in a static ReadWrite register integrated with

each universal block. The universal block and a neural network built with this block are

thus programmable afier different training sessions. Digital weight storage generally

occupies a large siiicon area in a neural network chip. For this reason, an area-efficient

memory with double-phase clock is custom designed, instead of using standard library

cells (latch or memory).

Figure 4.5 shows the schematic of one bit of a 5-bit weight register. Each single-bit ceIl

consists of MOS switches and three inverters, one in a switched feedback configuration.

After a reset pulse (aReseJ, a double-phase non-overlapping clock (e1-Q2) drives intemal

MOS switches and stores input data in the cell. A 5-bit parallel-in parallel-out register of

this type has been used for the programming and stonge of a sign-magnitude synaptic

weight. Similar units are used for the storage of threshold (bias) vaiue of neurons.

Nominal programming clock speed is 3.5MHz; however, the cells cm operate at higher

speeds up to an order of magnitude. An area swing of over 30% has k e n achieved for

each 5-bit storage unit compared to the most compact ce11 library option available in the

target CMOS technology.

Figure 4.5 Schematic of one bit of a 5-bit weight storage ce11 with double-phase cïock

A U n i d Hybrid BI& fw NNICs A Rogrcunmoblc U N d Hybrid Building Block 61

43.3 Characteristics of the Uaified Synapse-Neuron Circuit

The schematic diagram of the unified synapse-neuron (USN) circuit is shown in Figure

4.6. The USN consists of an improved MDAC (as explained before and shown in Figure

4.2(e)), loaded by a nonlinear sub-neuron. The sub-neuron is an element of a distributed

sigmoidal neuron circuit discussed earlier in Chapter 3. When a weight register is

integrated with a USN, they create a programmable universal building block for NNICs.

Experimental output current of the fabricated MDAC before the introduction of nonlinear

load (Le., measured through a linear resistive load tied to VDDR) is shown in Figure 4.7.

In this experiment, the digital weight was successively increased from -1 5 to 15, while the

input voltage was set at Vi,,, = X The measured stair-case current in 194pA range

characterizes the modified MDAC of Figure 4.2(e) without the effect of a nonlinear sub-

neuron. When a sub-neuron is introduced at the output. it converts the stair-case current

into a discrete sigmoid-like voltage. Simulation results in Figure 4.8(a) show the overall

characteristics of the USN for two parameter values of V, = 2V and 5V: Figure 4.8(b)

shows the measured output voltage characteristic of USN at Vi,.,, =5K Fabrication

measurernents were in a close agreement with the simulations.

Figure 4.6 Unüied Synapse-Neuron (USN) circuit

p 'out

Nonllnear load

A U n i v d Hybrid Block for NNICs A Prognmmoble Univami Hybrid Building Block 62

Univenity of Windsor

Fipure 4.7 Output current of the modifiecl MDAC (Figure 4.6 or Figure 4.2(e))

Y) 1 0 ., O .i œ N rl

Y) rl 1 r(

I Synaptic Weight

Figure 4.8 Overail characteristics of USN:

a) simulations for two parametric values of V,. b) experimentd measumments at hM., = SV

li.iiii.iii.iii*irii.I.*(i.iiii.iii.iiilirii.I...i.iriil(i.iiii.iii.iiilirii.I...i.iriil(i.iiii.iii.iiilirii.I...i.iriil(i.iiii.iii.iiilirii.I...i.iriil(i.iiii.iii.iiilirii.I...i.iriil=-LL'l

-15 O +15 Weight

A U n i d Hybrid BI& for NNlCs A Rognunmablc Univcd Hybnd Building Block 63

University of Windsar

4.3 Applications

4.3.1 An Opticd Template Matching Network

A block diagram of a 4-3-2 hybrid VLSI neural network based on a hybrid distributeci-

neuron architecture was shown earlier in Figure 2.3(b) in Chapter 2. As a pmof-of-

concept design, a complete circuit for this network has ken implemented based on arrays

of the universal hybrid building block presented in this chapter. A schematic diagram is

shown in Figure 4.9. The network is built with 23 blocks. Eighteen of these blocks are

the universal nonlinearly-loaded multiplier blocks. Five remaining blocks are used for

neuron threshold adjustment. These units are in fact the same universal blocks on silicon

with their nonlinear loads deactivated.

Figure 4.9 A 4-3-2 VLSI neural network based on arrays of a universal hybrid building black

A U n i d Hybnd B l d for NNlCs Applications 64

The network is trained for a 4-input template matching problem. An interactive Back-

Propagation simulator is developed by which the network and the patterns are defined for

an off-line training. The simulator will be explained in Chapter 6. While training is

perfonned with high precision weight values defined in software, the resulting weights are

rounded off to the resolution of the hardware (5 bits) and a simulated recall is followed.

When this final phase is passed. the weights are prognmmed on chip. Figure 4.1O(a)

shows four input templates learned earlier by the network during training. The inputs are

originally optical. but can also be applied to the chip ûs electronic signais as shown in

Figure 4.10(b). Templates 1 and 2 are to be detected and flagged on two outputs while the

other templates are to be rejected. The corresponding trained weights (w 's) and bias

values (b 's) refemng to the circuit in Figure 4.9 are:'

The circuit is designed, laid out and simulated using Cadence tools and HSpice. A chip is

fabricated in 1 . 2 ~ CMOS (see the layout in Figure 4.1 \(a)) that contains two venions of

this network: a) a weight-programmable network with electronic inputs; b) a network with

photosensitive inputs and pre-programmed weights for an optical template matching

application. The complete circuit layout consists of about 700 MOS transistors. Figure

4.1 1 (b) shows a microphotograph of the core area of the chip. Four photosensor cells

forming a square are used as optical inputs to the neural network. A fifth photosensor at

the middle, directly connected to an output pad, is used as a refennce to adjust the

sensitivity of the cells to background illumination. Physical dimensions of the chip are:

(3030 x 28 1 6.8)pn2, or 8.5mm2, including bonding pads, ESD and sorne test structures?

1. Supers@< 1 in b% and w'" refers to the layer numkr (cg. 1 = 1 , 1 = 2).

2. This chip is fabricated ttirough the Canadian Microelectronics Corporation (CMC) under the design nome W W . In CMC's fabrication record, chip dimensions are in design scale rnicmns (DSM). One DSM in 1 2pm CMOS4S technology is qua1 to 0.8 physical microns.

A Univerd Hybrid BIock for NMCI Applicati~l~~ 65

Unimsity of Windsor

During hardware recall, electronic input vectors were applied to the network as shown in

Fipre 4.10(b) at a typical rate of 1M.Vector.sec. Outputs 1 and 2 became active in

response to templates 1 and 2, respectively, as shown in Figure 4.10(c). Both outputs

remained zero for templates O and 3. Current and power consumption on V D D = SV

were: I , , = 730pA and Pave = 3.65m W . The circuit can operate at higher speeds or on

lower supply voltages. For instance, with V D D = 3.3 V. current and power consumptions

were: 1,,, = 155pA and Pa,= 5lOpW which is, on average, less thm 0.75pW per

transistor. This result indicates an 86% power reduction compared to the consumption on

a 5-Volt supply.

Fipre 4.10 Template matching: a) optical inputs, b) equivalent electronic inputs, c) chip outputs in maU

University af Windsor

Figure 4.11 a) Layout and b) core micropbotograph of temphte matching NMC

P b b 4.2. A &ta summary about template~matchlng NNIC

Function 1 Programmable template matching 1 1 Inputs 1 4 Optical + 4 Electronic (

m i c a l rate 1 1 - 2.SM.Vectors/sec.

I Dimensions 1 (3030 x 28 16.8)p2 1 r - Die area 1 8 .smm2 1 1 Package 1 68 Pin Grid Amy 1 1 Powet consumption on SV 1 3.65mW 1 1 Power consumption on 3.3V 1

-- - -

A Universai Hybnd Block fa NNlCs Applicoacoa~ 67

A low-voltage operation on 3.3V was possible due to a modified MDAC circuit shown

earlier in Figure 4.2(e). Simulations indicate that a similar neural network architecture

based on the original (cascode) MDAC circuit would fail to ncall the stond patterns when

operating from a 3.3V supply. This was caused by information loss in large threshold

zones of MDAC-type synapses.

The chip was also tested successhilly with optical input patterns similar to those in Figure

4.1 l(a).' In a long-terrn testing, the hnctionaiity of the chip was confirmed over a

48-hour period of continuous work. A data summary about the template-matching chip is

given in Table 4.2.

Programmable synaptic weights allow the NNIC to recognize different templates to be

detected in different applications. A particular application of interest was feature

extraction in a handwritten numeral recognition system. The system consists of three

stages: preprocessing, feature extraction and classification [6]. As each numeral has a

unique directional histogram, directional border templates become the features to be

extracted by such hardware (71. The basic process performed on a typical handwritten

number and the 2 x 2 directional ternplates programmed on NNIC for feature extraction

are illustnted in Figure 4.12. The use of a programmable neural network in feature

extraction stage allows a higher flexibility and a possibility of merging with neural

network classifier stage.

-- - -

1. A hardware demonsûation of the functionality of this chip was presented at TEXP0'97 [251.

Univcnity of Windsor

Figure 4.12 a) Border teahire extraction and b) àitectional templates for handwritten numeral recognition

Ternplate Direction Code

4.3.2 General Purpose Programmable Neural Network Classifier

Fipre 4.13 shows the microphotograph of a 16-4-3 programmable neural netwotk IC

designed for general purpose vector classification applications.' Fabricated in 1.2pm

CMOS, the chip is built with mixed-signal arrays of totally 83 programmable universai

building blocks and is tested for the mapping of up to 16-bit input vectors to vecton of 3

analog components at the output. In order to avoid an I/0 (pad) bounded layout and pin

limitations. a I6-bit serial-in parallel-out (SIPO) interface is integrated at the input.

Compact memory cells with 2-phase clock such as those described in Section 4.2.2 are

used in SIPO interface as well as for the programming and storage of synaptic weights.

Weight programming and test vector generation were performed using HP75000-D20 Test

System. A data summary about this chip is given in Table 4.3.

Due to a modular architecture based on a universal building block: a) the time and effort

spent for a custom layout were gnatly reduced, and b) a dense regular silicon

impkmentation with low interconnect area and complexity hm k e n achieved. In fact, -- --

1. The design is fabricatd through the Canadian Micmelectronics Corporaiion undtr the name WRPW.

A Universiil Hybrid Bloc& fm NNICs Applications 69

synapse-to-neuron interconnections are made locally inside universal blocks and the

remaining (global) routing is perfonned on a highly regular structure. in the target

numeral recognition system [6], two such NNICs are to be used in parallel in classification

stage to map the extracted features of directional border templates into expected classes.

Table 4.3. A daîa surnmary about programmable NNIC classifier

1 Function Programmable NN Classifier ( Architecture -1 -

1 No. of programmable units 1 83 x Sbit

1 - Dimensions 1 (2872.8 x 2 172.5)jm2 1 Die Area - Package 1 68 Pin Grid Array 1

1 Power on SV supply 1 10.8rnW 1 Power on 3.3V suppiy 1 1 S m W

Figure 4.13 Microphotograph of a general purpose 164-3 programmable NNIC classifier

A U n i d Hybrid Bloclt for NNfCs A p p l i d w t 70

4.3.3 Other NNIC Fabrications

Another NNIC which is designed and fabricated based on the presented architecture and

building blocks, is a neural-based smart photosensor. In the context of this design, a

considetable reduc tion in interconnec tion area and a corresponding incnase in sy naptic

density is highlighted in cornparison with a conventional implementation. Details will be

explained in Chapter 6. Moreover, a BiCMOS version of the proposed circuits and a

neural-based photosensor in that technology has been implemented [5 11.

4.4 Conclusion

A programmable nonlinearly-loaded mixed-signal multiplier was presented in this chapter

and in [17] as a univenal building block for the implementation of NNICs. The block

consists of an MDAC-type synapse, an element of a distributed neuron and a compact

weight register. Circuit design and improvements were described, and simulation and

experimentd results were presented in close agreement. Circuit techniques in MDAC

especially increased the dynarnic range of synaptic function and made possible an

operation on a 3.3-V, as well as a 5-V supply.

' b o NNICs were fabricated and tested as a proof-of-concept for the architecture and

building block. The first test chip was a 4-input template matching neural network with

both optical and electronic inputs. The second chip was a 16-4-3 general purpose

programmable neural network classifier. A low-voltage operation on 3.3V, rather than SV,

reduced the power consumption by 868. The two NMCs c m , nspectively. be used in

feature extraction and classification stages of a target handwntten numeral recognition

system.

A Universai Hybrid Block for NNlCs Conclusion 7 1

Chapter 5 Quantization Noise

5.1 Introduction

The main purpose of implementing a neural network on hardware is

to realize a true Parallel Distributed Processor. Hardware

implementations. however, introduce various non-idealities such as

weight quantization effects and variations of characteristics.

Moreover, in order to realize dense and high-speed neurai networks

with a large number of neurons for reai world applications, the use

of simple synapses and neurons with low precision weights and

other non-idealities is unavoidable. The effect of weight

quantization especiaily becomes more apparent at the outputs when

the network becomes larger.

A statistical analysis is carried out in [93] on the effect of

quantization in multilayer neural networks. This andysis considers

relatively srnall networks in both learning and mal1 phases. The

number of quantization bits required is fairly high (8-10 bits)

because of the requirements of leaming phase. However, a different

appmach with a lower bit remlution can be taken, if we are only

concemed about quantization effects ofter leaming, i.e. the

implementation of an ideaily-trained network. Sensitivity to weight

emns of neural networks with increasing number of neurons is

- - - - - - -

QuaDtiPa'on Nok ünprovemcat tnüduaion 72

analyzed in [68]. A stochastic mode1 is developed to study an ensemble of networks with

diRering weights and the focus is on the implernentation of recall phase. In this chapter,

we build on this model and present a new model for an Adaline with distributed-neuron

structure. This model predicts a lower quantization noise in a hybnd distributed-neuron

architecture compared to a conventional lumped-neuron architecture.

5.2 Modeling a Distributed Neuron

In this section. the analytical models of a lumped and a distributed resistive neuron are

studied without reference to any specific circuit implementations. As a result of this study,

an important property of a distributed neuron, namely self-scaling, will be fonnulated.

Two basic tasks of a neuron are summation of synaptic inputs, and nonlinear saturating

function. A third task, Le. neuron threshold adjustment, can be modeled by an extra input

synapse. Synapses in their simplest fom are modeled as ideal multipliers. In a neural

network built with tramconductance (V-to-1) synapses and resistive-type neurons,

sumrnation is simply performed by hardwiring the output currents of synapses together.

An Adaline with lumped resistive neuron c m be realized as depicted in Figure 5.1. In this

case, a lumped resistive neuron receives the summation of currents and generates on the

same supernode an output voltage which is a nonlinear function of total synaptic input.

Figure 5.1 An Adrliw implemented with lumped resistive-type neuron

A nonlinear resistive-type neuron cm be distributed into parallel sub-neurons assuming

their collective response remains the same. If the number of sub-neurons are chosen to be

equal to the number of input synapses, each sub-neuron and a corresponding synapse can

create a unified synapse-neuron (USN). In this thesis an architecture based on a hybrid

analog-digital USN has been presented. The architecture is comprised of digitized

synaptic weights, andog distributed neurons and multiplying DIA synapses. An Adaline

based on this architecture is shown in Fiare 5.2 and is modeled in the rest of this chapter.

We will fomuiate the self-scaling property of distributed-neuron Adaline and show that in

a 'hybrid' architecture that relies on quantized weights, this property reduces the effect of

quantization noise at the output.

Figure 5 3 An Adaline with a disttibuted-neuron architecture

5.2.1 Increase in the number of Adoline inputs

An increase or decrease in the number of neuron inputs is cornmonplace. For instance, a

change in the number of nodes (or neurons) in a layer brings a comsponding change in

the number of inputs to al1 neurons in the next layer. Such situations are inevitable, e.g.

when a programmable neural network chip is used in diffeient applications, or when in

cascadable chip sets new synaptic modules are added in input or hidden layer. This may

also be the case in a leaming process involving network topology modification [13].

The output of an N- input neuron with activation function f (.) is:

Different fan-in conditions may occur for a neuron when a programmable network is used

in different applications. This is also the case when the same neuron circuit (cell) is used

in different layers of a neural network with different number of synaptic inputs.

Real world applications especially require neural networks that have neurons with a large

number of inputs, while in most cases it is difficult to detennine the exact number of nodes

and hence the fan-in conditions of neurons kforehand. In generai, if the number of inputs

to a lumped neuron is increased by a factor of S (not necessady an integer number), then:

A saturating hinction initially designed for an N-input neuron is shown in Figure 5.3(a).

The horizontal axis represents total synaptic input, also known as net input. The same

neuron function when the number of inputs is increased to N.S is illustrated in Figure

5.3(b) that involves a larger dynarnic range of net input. For S >> 1, output function y,

will contain large saturation areas and a n m w transition region compared to the whole

input dynamic range. Quantization noise is mon amplified in a seemingly sharper

transition region, while the information canying signal soon saturates and, therefore,

signal-to-noise ratio deteriorates. In principle. the neuron should be ce-designed such that

its saturating function properly spreads over the dynamic range of net input. A propedy-

scaled neuron characteristic for new input conditions is illustrated in Figure 5.3(c).

Re-designing a neuron is. apparently, neither convenient nor in somc cases possible.

Figure 5.3 a) An N-input neumn characteristic over the original range of inputs, b) the same neunni when inputs are i n c d to N.S,

c) a properly-scdd neuron with N.S inputs

Neumnls y. = f (Z w,& .xi) Output k =t

Neuron8s A y,, = f (C w d .x l ) Output k = l

rn 1 w

-S.I0 S.1 Synaptic lnput

yn = F (2 w n k * ~ k ) = F (lm ) output k=l

5.2.2 Self-scaiing Formulation

When the number of inputs to a neuron increases, one method to avoid over-saturation,

besides n-designing, is to reduce synaptic activity by scaling down synaptic inputs or

weights. A weight scding rnethod such as the one proposed in [92] is only practical for

software implementations, and becomes too complex in hardware as it requires one

scaling module for each synaptic weight. If we choose S as the scaling factor on synaptic

weights, we have:

N.S A where Wnk Jk = I W ~ is net input before weight scaling.

Equivalently, we should be able to use the same net input combined with a properly scaled

neuron activation function F ( , ) :

A scded sigrnoidal function, for example, is defined as:

dF in which -

by a factor of S.

1 - ~ ~ ( 0 ) 1 = - , i.e. slope gain at the origin is decreased

r,, =O S S

Figure 5.4 Neuron input increase for a distributed aeuron

Therefore, scaling dl synaptic weights with 1 / S is equivalent to using the same set of

weights combined with a scaled activation function defined as above. The distributed-

neuron structure presents a scaling property similar to the above scheme, Le. if the number

of input synapses to a distributed neuron is increased by a factor S. the neuron will consist

of S similar nonlinear blocks in parailel (each possibly consisting of Nresistive sub-

neurons). Refemng to Figure 5.4, as current equally divides arnong the similar blocks,

the output voltage can be obtained in two alternative ways:

represents the nonlinear huiction of the original IV-input neuron block, and Ime. is the

current through each and every sub-neuron. Thus, from (I) and (II) we conclude:

QuMtuPn'm N o k ünpc0~#11~111 M-ng m DWhîcd Ncumn 78

The basic self-scaling property of a distributed neuron is described by Eqn. (5.5).

According to this equation, a distributed neuron exhibits a self-scaling property that is

equivaient to scaling down all the weights proportional to increase in the number of input

synapses. This property will be used in the next section to transfomi a statistical model of

a conventional (lumped) Adaline to a new mode1 for a distributed-neuron Adaline.

Intuitively speaking. as the number of synaptic inputs (Le. the number of neurons in

previous layer) increases. the overall nonlinear characteristic of a distributed neuron

automaticaliy stretches and proportiondly coven the entire dynamic range of inputs. This

property of a distributed neuron preserves the information received from extra synaptic

inputs that would have k e n lost otherwise in large saturation areas of a fixed lumped

neuron with increasing nurnber of inputs.

5.3 Stochastic Mode1

5.3.1 Sigrnoidai Adaüne with Lumped Neuron

A stochastic model pnsented in [68] defines the ided output of an Adaline (in an arbitrary

layer of a Madaiine) and the comsponding output error as follows:

where X* stands for the transpose of matrix X, and W, , X. AW, and are

independent identically distributed (iid) random vecton representing weights, inputs,

weight emrs and input errors, nspectively.

The output Noise-to-Signal Ratio (NSR) of the Addine is defined as the ratio of the

variance of the output error (due to quantization noise, etc.) to the variance of the ideal

output (e.g. due to diffenng weights corresponding to different training sets):

O'A~ NSR = 7

Based on this model, the output NSR of a sigmoidal Adaline c m be cxpressed as a linear

combination of input NSR, a2k / 02,, and weight NSR, 02Aw / 6 2 w , and is

amplified by a stochastic gain function g(.):

The output NSR of an Adaline in an arbitrary Iayer of a Madaline can be computed

recursively starting from the input layer. Stochastic gain g(.), is aiways greater than 1 and

is an increasing function of its argument, %,O, ; where Nis the nurnber of inputs to

Adaline, and 0, and 6, are standard deviations of inputs and weights, respectively.

Increasing the number of inputs to an Adaline increases its stochastic gain, g(.) [68].

Therefore, in a conventional neural network with lumped sigrnoidal neurons, an increase

in the number of inputs to different layers causes an unwanted increase in the output NSR.

If the nurnber of inputs to an Adaline incmases by a factor S and input and weight

variances remain the sarne, in the absence of any scaling scheme the output Noise-to-

Signai Ratio of a lumped Adaline will incnase to NSRl given by the following equation:

In other words, the effect of weight quantization becomes more apparent at the output of a

larger network.

5.3.2 Sigmoidal Adaline with Distributed Neuron

If we manage to reshape the nonlinear characteristic of a neuron in an Adaline in response

to an increase in the number of inputs. we will be able to control the stochastic gain factor

g(J, and hence the Noise-to-Signal Ratio. In the case of a lumped neuron. this should be

done by re-designing the saturating function. On the other hand, the self-scaling property

inherent in a distributed-neumn structure presents a natural way of controlling g(.).

Formulated in Eqn. (5.5). this property in tum affects the NSR of a distributed-neuron

Adaline. The impact of self-scaling on NSR is explored here.

In a distributed-neuron structure. according to Eqn. (5.5) every weight to Adaline is, in

effect. scaled by 1 / S as the number of inputs increases. If we define w~ = w / S as a

scaled weight, then the statistical parameters for the scaled weight are:

2 2 2 c 2 w S = d w / S and o 2 b w S = O AW / s2 . Thercfore, NSR2 or the Noise-to-Signal

Ratio of a distributed Adaline with increased number of inputs will be:

The terrns in linear combination nmain unchanged, while the stochastic gain is decreased

due to the scaling of its argument with 1 / 6. This property will reduce Noise-toSignal

Quantiauion Noise üupmvcmcnt Stochastic Mode1 8 1

Ratio, NSR,, and improves the performance of recall hardware proportionally. The

resulting stochastic model is depicted in Figure 5.5. The improvement is demonstrated by

an exarnple in Section 5.4.

Figure 5.5 Stochastic model for an Adaiine with distributed neuron

5.4 A Case Study

For an Adaline with N = 25 inputs, suppose inputs and weights are unifomly distributed

2 aver the range [a, b] = [-2.21; thenfore, a21 = a2w = (b - a) 112 = 4 1 3 .

Assuming an 8-bit quantization scheme. weights an quantized to levels equdly spaced by

q = 1/ 64 ; thus, weight emr variance will be C * A ~ = q2 112 = 2 x 10" .

Furthemore, we assume a2h = O as in this discussion we are interested in the net effect

of weight quantization only. For fi- > 2 , gain hinction defined in [68] may be

approximated as:

The output noise-to-signal ratio of the 25-input Adaline is then:

2 u2hr 0 Aw

NSR = g ( J N ~ , ~ , ) X (-r + = g(6.67)-(1-5 X ) d x w

Now, if the number of inputs is increased to 100 (an increase by factor S = 4 ), then:

(1) For a conventional (lumped-neuron) architecture, NSR cm be found from Eqn.

(5.10) as,

NSR, = g(13.3).(1.5 x 10") = 11.3 x 10" = -39.5 dB.

(mu, instead, we use a distributed-neuron architecture, then after the input

increase, from Eqn. (5.1 1 ) we will have:

NSR ,= g(3.33).(1.5 x 10") = 3.4 x 10" = -44.7 dB

The difference between the Noise-to-Signal Ratios in case (1) and case (II) is:

in this example, NSR is reduced almost by a factor of 3, or 5.2 dB in decibel ternis. In

other words, Signal-to-Noise Ratio (SNR) is increased by a similar factor. The

improvement would be even more noticeable for larger input increase factors. Figure 5.6

QunntizPn'm Noire lmprowmcnt A Casc Study 83

shows the improvement in SNR for different values of N (initial neuron inputs) and S

(input increase or scaiing factor) based on our simulations.

Figure 5.6 Signai-to-Noise Ratio improvememt vs. input incmase factor

I NSR2- NSR, I ( dB )

1 2 3 4 5 6 7 8 9 1 0 S : Input lncrease

5.5 Discussion and Conclusion

Nonlinear circuits based on square-law characteristics of MOS transistors are used to

implement distributed neurons as explained in Chapter 3. Each nonlinear sub-neuron is a

compact circuit hsed into a multiplying DIA synapse. the latter one generates a current

proportional to the product of an analog input by a digitized weight (see Figure 5.2).

Each sub-neuron presents a nonlinear characteristic which is designed to cover the

dynamic range of one synaptic input. As a nile of thumb, we found it suitable to have

roughly 30% of input dynamic range of each element in low saturation region

Qrwntizotion Noise lmprovtment ûi-*on d Conclusion 84

( y ) some 40% of input dynamic range in transition region

( 0.1 yn-,= < yn < 0.9 y,-,,), and the remaining 30% of input dynamic range in high

saturation end ( Yn > 0.9 yn-,= ). When a nonlinear sub-neuron is integrated with every

synapse. the above-specified shape will be proportionally preserved for an N-input neuron.

regardless of the number of inputs. The resulting unified synapse-neuron blocks present a

highly modular and scdable solution for the design of VLSI neural networks with

differcnt sizes in diffennt applications and has ken successfully used in the

implementation of programmable neural network classifiers as described in Chapter 4.

In conclusion, a stochastic model was presented for the first time for an Adaline with

distributed neuron implementation. The self-scaling property of a distributed neuron was

formulated in this chapter and applied to transform an existing model for a conventional

(lumped-neuron) Adaline to the one presented for the tint time for a distributed-neuron

Adaline. Based on the presented stochastic analysis and simulations, the ratio of signal to

quantization noise increases considerably for large number of neuron inputs (or nodes per

Iayer), when a programmable neural network hardware is based on a distnbuted- rather

than a lumped-neuron architecture.

A main conclusion in 1681, 1691 is that increasing the number of nodes per layer in a

(conventional) Madaline increases the required weight accuracy given a maximum

allowable noise-to-signal ratio. In this chapter, it was shown that a distributed-neuron

architecture is advantageous in terms of maintaining a better signal-to-noise ratio as the

number of neuron inputs (or nodes per layer) increases. The larger the network becomes.

the more apparent the S N R advantage is, compared to n conventional Madaline network.

A final note is that a higher SNR in a distributed-neuron architecture can be traded off at a

certain level with a lower bit precision. Depending on network topoiogy, every 5 to 1Od.B

difference in SNR (6 dB on an average sense) is equivalent to one bit difference in weight

precision.

1. In neuron circuit y,,,, is ihe same as supply voltage (SV typicd in our target CMOS techno1ogy).

Qiiitotiznr-on N o k lmpmvtmnt ûimssioa aad Conclusion 85

Chapter 6 Neural-based

Smatt Photosensor

6.1 Introduction

In this chapter the design of r neural-network-based smart

photosensor for focal-plane pattem classification is e~~1ained.l

These sensors are designed for on-line pattem classification

applications requiring image capture or non-contact measurements.

We fint review previous work on CMOStompatible

photoreceptors and neural-based smart sensors. The author had r

chance to contribute to some aspects of the earlier designs of these

sensors in VLSI Research Group, University of Windsor, including

fabrication submission and testing, as well as design transition to

newer CAD environments. Through this experience, a valuable

insight was obtained and the main problems were identified.

Two main issues in the design of out target neural-based

photosensor are about photosensor array and neural network

architecture. The two issues are first discussed separately in this

chapter. From this study the type of photosensor elements and the

architecture of neural network classifier will be: determined, and

finally things will be put together in a novel design.

1, The material in this chaptcr is mainly based on two publications fmm this work at [SCAS'% 1231 and ISCAS'98 [16].

Nd-baocd SmPrt Pbîosuwr Introduction 86

Photosensors are based on a modified photoBJT in CMOS technology and act as input

nodes to a neural network classifier. A fully-connected multi-layer feedforward neural

network is chosen as it has shown superior performance over a partially-connected scheme

in classif'ying noisy patterns. Interconnection areas and problems, however created a

bottleneck in a conventional fully-connected design relying on lumped neurons and

synapses and a lumped photosensor array. This problem is greatly alleviated in the final

presented design which is based on a 2-D distributed-neuron structure and a distributed

array of 'smart pixels'. The new architecture results in a highly modular and am-efficient

VLSI implementation that has incnased synaptic density by a factor of more than two in

the same technology. The proposed smart sensor design also benefits from a robust neural

architecture due to the properties mentioned earlier in this thesis.

6.2 Objectives and Issues

On-line optical classification of objects or geometrical features is a task often encountered

in indusuial or manufacniring environrnents. Solutions to this problem range from a

rather elaborate system including CCD imager and signal processing hardware andor

software, to a single smart chip integrating photosensors and classifier processor.

Nowadays, modern sensors tend to become more and more autonomous subsystems by

self-containing some sort of signal processing. Among the best technologies for these so-

called "integrated smart sensors" (ISS)' is CMOS. which allows a dense CO-integrotion of

various sensors and signal processing circuits on a single chip.

Motivated by a manufacturing process control application, oui goal is to design a

programmable smart photosensor for on-line classification of low-resolution patterns.

The sensor is to be used in a manufacturing process control to determine the position or

classify the surface geometry of an object whose image is captund on chip 1121, [42].

In acnial operation a pattern is 'imaged' ont0 the photosensitive array using laser beam

steering or structured illumination techniques. The imaged pattern is a 2-D projection of a

1. An integmted smn sewor (ISS) by definition is a co-integration of one or more sensor transducers and signai processing hardware. In a neuml-hed mart sensor this pocessing k a n e u d compuwiion.

3-D geometry. Based on the above noncontact measurement, the output state of the

classifier defines a control vector for the on-line process, which in turn adjusts the position

or process parameters of the object under control. A set of applied patterns representing

the tension on a string and the comsponding classes are shown in Table 6.2 on page 106.

Artificial neural networks are known as good pattern classifiers that are trainable for

different applications and offer high-speed solutions when implemented on non-

multiplexed hardware. A photosensor chip and a programmable Neural Network iC

(NNIC) would be a two-chip candidate solution to our problem. However, the number of

inputs from a 2-D photosensor chip to NNIC (e.g. 8 x 8 = 64 inputs) becornes a prohibitive

factor due to pin limitations and the complexity or delay of interface circuiiry. A srnart

photosensor chip with integrated focal-plane pattern classifier is an ideal solution here.

To implernent a smart sensor, a standard digital CMOS process is chosen since it is a

mature technology that has shown the possibility of creating low-cost customizable image

sensors as well as dense integration of neural processing circuitry. Based on the above

facts, hybrid VLSI architectures are explond for dense CO-integration of a multilayer

feedforward (MLFF) neural network classifier and a photosensor m a y in a standard

CMOS chip. Thus, two issues are to be addtessed separately, namely, ( i ) the realization

of a CMOS-compatible photosensor array, ( ii ) an efficient architecture for the VLSI

implementation of a fuliy-connected programmable neural network with a 2-D optical

inputs array.

6.3 Photosensor Array

6.3.1 CMOS-compatible Photosensitive Device

Photosensitive devices using standard CMOS, e.g. Active Pixel sensors,' are becoming

incrcasingly popular [79], [39], [54]. CCD technology, despite proven performance in

1. Active Pixel Sensors (APS) are 'Icss' smatt senson that include sensor ampiification and random-access circuitry, A smart sensor usuaily includes a higher level of signal processing.

imaging applications, requires special fabrication process and suffers from intrinsic image

smear, reset noise and difficulty of random access to individual pixels. In many intelligent

imaging and sensory applications such as machine vision or neural neiworks, random or

parallel access to individual cells is a requirement [47], [94]. A CMOS-compatible

photosensitive arrays makes a good alternative in such applications. This alternative,

moreover, eliminates the requirement of special fabrication process and allows the

integration of sensor and signal processing circuitry in standard CMOS, thus making

leasible the implernentation of lowîost low-power smart sensors.

PhotoMOS and Photodiodes have little or no gain and their output reading is destructive

[46]. A PhotoBST is an attractive choice as a sensor because of its intrinsic gain and the

fact that it can be obtained as a by-product device in CMOS technology. Low bandwidth

of a photoBIT is not a problem in our application, and so is the case in many other

manufacturing environments. A higher noise in photoBJT cornpared to photodiode will be

cornpensated to some extent in a modified photoBJT that has an improved responsivity, or

in other words a higher 'signal' Ievel. Our chosen technology is a standard N-well CMOS

process.' In this technology a vertical BIT is found to be more reliable and have higher

gain (hf, = 35, based on our experiments) compared to a lateral BJT (hfi = 1) . A parasitic

photosensitive device c m be built as a vertical PNP transistor with fioating base

configuration. In this case, as shown in Figure 6.l(a), P+ difision area is the Emitter,

N-well f o m the Base, and P-substrate is the Collecter, N-well foms the area sensitive to

light as photo electron-holes are generated at the junction of N-well to P-substrate (i.e. the

base-collector junction). This pmcess constitutes base photocurrent that in turn is

amplified by a factor of (hB + 1) to create emitter output current. A vertical photoBJT

suffers from large basetollector capacitance Ck which reduces its optical responsivity

and bandwidth,

A Field-Effect-Modified (FECI) photoBJT structure, as shown in Figure 6.1(b), has been

used to improve the responsivity of the device without any additional fabrication step 1471.

1. CMOS4S: a 1 . 2 ~ double-mctal double-poly standard CMOS process h m Norihern Telecom (Nostel).

A description of the original device in a bipolar process can be found in [62], while here it

is implemented in a digital CMOS technology without any additional fabrication mask or

DRC' violation. The diffusion region at the center is the emitter of a vertical PNP

transistor. The base, however, is divided by the annular P+ region into two portions,

intemal base and extemal base. These two portions dong with the annular P+ region form

an nthannel JFET in which P+ is the gate. The circular gate i s formed during the same

diffusion step in which the emitter is fonmd. When the gate is sufficiently reverse-biased,

e.g. grounded, the channel connecting the two base portions is pinched off. Therefore, the

effective capacitance Ch, is mainly reduced to that of the intemal region whereas the

primary photocurrent includes both intemal and external components. Device

responsivity is improved by increasing the ratio of extemai to interna1 base eea. In

practice, DRC niles are not to be violated and if should be also noted that contact areris of

emitter and annular P+ diffusion are not transparent to incident light.

6.33 CMOS Photoreceptor Ce11 Circuit

The circuit schematic of the CMOS photoreceptor ce11 consisting of a FEM photoBJT, a

logarithrnic 1-to-V converter PMOS, level-shifting and buffering is shown in Figure 6.2.

A photoreceptor ce11 consists of a photosensitive device to transduce incident light into

electrical current, and a logarithmic element to cornpress the dynamic range [53].

As described earlier, the photosensitive device is a FEM PNP vertical BJT formed in an N-

well CMOS process. The logarithmic element is a diode-connected PMOS load which

converts a wide range photocurrent to a small range photovoltage while operating in the

sub-threshold region to maintain loganthmic response. The photoreceptor circuit has a ce11

layout of 95 x 9oPm2 in 1.2pm CMOS. The total N-well area fonning the photobase

region is 50 x 50pm2, a fraction of which (i.e., 9 x 9pmZ) is devoted to ernitter ngion. The

bias voltage, Vs , acts as a scnsitivity or threshold control for the ce11 to generate a binary

output. Note that the type of output, digital or analog. mainly depends on the type of

buffer used at the ce11 output.

1. DRC: Design Rule Check, a set of geometrïcai mies to be adhefed to in a process technology.

Figure 6.1 Top view, cross section and device equivalent mode1 OP: a) vertical photoB JT, b) Field-Effect M d i d (FEM) vertical photoB JT

Cross section (a)

N- Weil 3C1-

1 P- subsmte ( Collecter ) \ Cross section

Figure 6.2 Photosesor ceIl circuit

In summary, an FEM bipolor transistor is realized in a standard CMOS process as a

photosensitive device. According to the descriptions given in Section 6.3.1, a crucial

modification has been made on the device presented in [47] and [46] in which the annula

PC region was connected to VDD. Another important improvement compared to [44] and

[47] lies in a layout technique applied to the circuit of photosensor cell. In this technique

N-well areas anywhere other than in photosensitive device, are covered by metal layen

(M 1 or M2) or by polysilicon, so as to minimize unwanted photo electron-hole generation

in the substrate. Sstisfactory test results and enhanced responsivity have lead to the use of

FEM BJT photoreceptor cells in our neural-bascd photosensors [23], [16].

6.4 A Review of Conventional Designs

Previous work on the design of CMOS photoreceptor arrays and neural-network-based

photosensors can be found in [12], [42], [Ml, [45]. The author has reviewed the

evolutionary steps of these designs 1231 and has ken involved in aspects of earlier works

in our group including design improvements, transitions to newer CAD environments,'

irnplementation submissions and the testing of the fabricated chips. In this section, two

1. Technology transfea and design Iiansitions h m Cadence EDGE to OPUS and later CO Ciuience9W97A.

conventionai designs are chosen for review that highlight major steps in evolutionary

design of this NNIC family. A novel design presented in this thesis will be explained in

Section 6.5.

6.4.1 Partially-connected Pm-pmgmmmed Neural-based Sensor

This design contains a 10 x 10 photoreceptor array and a neural network classifier with

100 input. 16 hidden and 5 output nodes [42]. To avoid massive synaptic interconnections

and to reduce the routing problems and areas, a 'partially-connected' network has been

implemented, i.e. the input array is divided into four sub-arrays each connected to four

out of 16 hidden neurons. The sixteen neurons in hidden layer are fully connected to five

output neurons,

The network is trained off-line to recognize eight input patterns for a process control

classification tûsk. Synaptic weights are implemented (pre-prognmmed) on transistor

widihs. A fabricated CMOS chip successfully recalled eight trained pattems, when they

were projected ont0 the photosensitive array using a microscope in conjunction with other

lenses. When noisy pattems were introduced, the network was able to recognize about

94% of patterns with Hamming distance (error) of one in simulated recall and 80% in test.

Increasing the number of error bits in input noisy pattems, however quickly deteriorated

the percentage of correct recall. On the other hand, simulation study shows that a 'fully-

connected', rather than a partiallyconnected, neural classifier would correctly recall more

than 90% of noisy patterns with up to 3 error bits in this application. The cost associated

with a mon complicated design with increased die area in a fully-connected M C , would

be paid off by a superior performance under noisy input conditions. Moreover, it is well

known that fully-connected neural networks are more fault tolerant.

6.4.2 Fuiiy-coanected Programmable NeumCbased Sensor

Based on the above discussion, the VLSI realization of a neural classifier with 'fully-

connected' synaptic scheme should be our target for a robust e m t l fault-tolerant

focal-plane pattern classifier. Moreover, programmability is an attractive feature that

makes the design flexible and compatible with different pattem sets in different

applications.

A fully-connected programmable neural-based photosensor chip can be found in [44] and

[43]. This 1.2prn implementation displays a classical CO-integration of a CMOS

photosensor array and a programmable neural network classifier. It contains a 5 x 5

lumped photosensor array integrated with a fully-connected multilayer feedfoward neural

network with 25 input. 4 hidden, 3 output nodes, and a conventional synapse and neuron

realization. On-chip digital weight memory is included for the programrning and storage

of synaptic weights. The dimensions of the photoreceptor array is practically lirnited to

5 x 5 because of: ( i ) interconnection problems and areas arising from the growing

complexity of synapses in a fully-connected N N K (note that the number of synaptic

interconnections roughly increases with the number of neurons squared); ( ii) a multitude

of circuit blocks, especially on-chip digital weight memory that occupies a considerable

die area.

Despite a fulltustom layout, about 608 of the core m a on this conventional neural

network based photosensor chip was occupied by metal interconnections. In practice, this

situation created a bonleneck that limited the dimensions of sensor arny as well as the

size of neural network classifier on chip.

6.5 Distributed Neural-based Sensor Architecture

In this section the design of a novel neural-based photosefisor chip developed in this thesis

is described. The terni 'distributed' in the section title refers to the facts that the presented

design relies both on a distributed-neum architecture, as well as a disrnbuted array of

smart photosensors or pixels. Figure 6.3 shows the structure of the target neural-based

smart sensor for focal-plane pattern classification.

Figure 6.3 Neural-baseà photosensor for foeal-plane pattern c1assMcation

6.5.1 2-D Distributed-neumn Architecture

Combined VLSI implementation of the two main building blocks of a neural network in

the form of a unified synapse-neuron (USN) offea many advantages. As described in

previous chapters, such a realization is based on a distributed resistive neuron architecture

and brings modularity and robust neuron characteristics. Figure 6.4 illustrates a

combined realization of one neuron and N digitally-programmable synapses by using

2 parallel output connection of N = n universal building blocks on an n x n array as uscd

in the new design of smart photosensor. Each universal block consists of a sign-magnitude

synaptic weight register, a Multiplying DAC synapse with bi-directional output cumnt,

and an active nonlinear load as a sub-neuron. This is a two-dimensionally-distributed

version of the architecture presented in previous chapters.

University of Windzror

Figure 6.4 Hybrid distributeci-neuron architecture on a 2-D array

PmpmimiM Synapse 1 a Synapse 2

Weight Register ( ~ ~ ~ 2 )

6.5.2 Distributed Array of Smart Pixels

The new version of focal-plane pattern classifier chip uses a regular architecture of neural-

based smart pixels with distributed neurons to overcome some of the problems faced in a

conventional implementation. A modular and distributed architecture is developed at two

levels of hierarchy. Elements of the photosensor array are distributed across the core area.

Each individuai sensor element is closely integrated with al1 of its outgoing synapses,

synaptic weight storage and associated parts of distributed neurons in the hidden layer.

With this design approach (to be explained in more details), interconnection problems and

areas are greatly reduced and a larget photosensor-classifier with increased synaptic

density has been fabricated on the available die area. Moreover, unifom and robust

characteristics are achieved for the fabncated neurons regardless of the design die size.

A neural-based smart sensor with N = n2 optical input nodes, m hidden and k output

nodes is implemented in a highly modular and scalable scheme described below:

An n xn array of smart pixel modules is unifomily distributed across the silicon die area.

As shown in Figure 6.5, each smart pixel is comprised of the following elements:

University of Windwr

Fipte 6.5 Neural -bd smart pixel:

a) a schematic diagram, b) die microphotograph of two adjacent pixels

FEM-BJT l e:- t

University of W~nâsor

a) a photoreceptor ce11 as described earlier and depicted in Figure 6.l(b);

b) m programmable weight registers for the storage of synaptic weights; c) rn unified

synapse-neuron (USN) blocks that contain al1 (m) synapses form this optical input node to

hidden layer with a sub-neuron (MW of a distributed neuron in hidden layer) at the

output of each synapse; d) local clock drivers to reset and write in weight registen;

e) dc bias circuits.

Smart pixel modules an placed in such a manner that their photosensitive devices are

evenly spaced on a two-dimensional grid; a crucial requirement for photosensor array.

N-wells are covered with metal or polysilicon layer everywhere except in FEM photoBJT,

so as to minimize unwanted photo electron-hole generation. Findly, a regular ni xk array

of USN and weight register blocks forms the synapses to and the neurons in the output

layer. This anay includes k synapses and synaptic weights from each of m hidden neurons

to output neurons. It also includes k output neurons each distributed arnoiig m synapses.

Additional building blocks of similar type set the threshold level of neurons in hidden and

output layers, ir. n programmable blocks for hidden neurons and k for output neurons.

The threshold value is stored as a signed number in the (earlier named) weight register

associated with each USN block. The input to a threshold block is a fixed non-zero

voltage, in this case Vm.

A chip containing an 8 x 8 photosensitive array and a fully-connected programmable

neural network with N=64 inputs, m= 8 hidden neurons and k=4 output neurons has been

fabricated in 1.2pm CMOS. A fiwrplan of this design is shown in Figure 6.6. Total chip

area is 14.7mm2. About 90% of the chip core area is devoted to an 8 x 8 array of smart

pixels containing a unifom 2-D array of photoreceptors and other USN and weight

storage circuitry in input and hidden layers. The remainder of core ana (bottom part of

the floorplan in Figure 6.6) belongs to USN and storage units of output layer and

threshold units for both hidden and output neurons. Figure 6.7 shows the

microphotograph of the fabncated chip.

Univcnity of Windsor

6.53 Characteristics of the Neural-based Sensor Chip

Some design data and experimental characteristics about neural-based photosensor chip

are as follows.

.Optical input pattern: 8 x 8 binary

.Output class: A vector of 4 analog signals

~Chip core area: 1 1 .2mm2

.Transistor count: 60,OOO

4 k r e n t from 5-V supply: 6mA < IvDD < 27.5mA

(weight and pattem dependent)

.On-chip weight storage: 556 x 5 bits

Programming clock (2-phase non-overlap): 0.5 - 20 MHz (3.5MHz Spical)

~Weight programming cycle: 160pS (based on 3 SMHz clock)

.Throughput time (input to recalled output) - 3 . 5 ~ s

*Connections Per Second (CPS) - 160 Mega CPS

A practical issue observed in the previous neural-based sensor implementations was

related to optical input array which was found very sensitive to misaiignments and

vibrations of test tixture [41]. Moreover, optical cross-talk arnong neighboring pixels was

reported to be a problem [46]. The presented design has significantly reduced the two

rnentioned testing problems by distributing the pixels across the die area which leaves

distance among photosensitive elements in the m y . This creates some leeway for the

optical pattem shined onto the chip. Optical setup is similar to [41] which includes a light

source, lenses and a Wentworth probing station with a microscope through which input

patterns are focused onto the optical array.

6.5.4 Synaptic Density and Interco~ections

In the distributcd sensor design: a) a major part of neural network interconnections are

made locally inside smart pixel modules on short metal or polysilicon paths; b) additional

global routing is on a highly-rtgular structure in vertical and horizontal channels between

the modules; c) wasted inter-block silicon ana is very little because the= is only one type

Univctsity of Windsor

of building block at each level of hierarchy (i.e. 'srnart pixel' at the top level and USN at

the lower level). Totally 556 programmable units are on chip (544 synapses and 12

threshold units) that determine the input-output mapping performed by neural classifier.

After an off-line training session, sensor is prograrnmed by two-phase non-overlapping

clocks that ripple a sequence of 5-bit synaptic weights through the storage units.

By using the design approach explained in Section 6.5.2 and by cell-level optimization,

the number of synapses per unit die area is considerably increased compared to a

conventional version of programmable neural-based sensor [4 11. Time and effort

associated with custom interconnection has been greatly reduced as well. Table 6.1

shows a comparison between the conventional and the distributed-architecture design.

In order to establish a consistent base of comparison, a) both designs were fabricated in

the same process technology; b) synaptic density Ds is defined as the average number of

synapses per unit core ana and comparison is made based on normalized ils%; c) there

has been an attempt to determine any contributions from custom layout and ce11

optimization in order to highlight the net 'architectural' improvement.

On the basis of our experimental study summarized in Table 6.1, synaptic density is

increased by a factor of 2.7 in a neural-based photosensor with the proposed distributed

architecture. A maximum of about 60-70% improvement is associated with ceIl and

layout optimization. Therefore, at least 100% increase in synaptic density cornes from

architectural improvement due to a modular and distributed structure at two ievels of

hierarchy. In Table 6.1, the total number of synapses (including neuron threshold units)

for an n2-ni-k fblly-connected neural network is:

2 Total No. of Synapses = (n x ni) + (na x k ) + m + k

Moreover, synaptic density Ds is defined as:

D p ( n ' x m ) + ( m ~ k ) + m + k CoreA rea

Figure 6.6 Floorplan of the neural-bascd photosensor chip

L J iu r Smart Pixels TI

Threshold of 8 x4 A m y for 4 o u t p u t s q 1 Hidden to Output Laver 1

Clock dinrer ' 1

and Bias -O L, Threshold of 8 Hidden Neurons 1

Figure 6.7 Micmphotognph of the neural-based photosensor chip

The ratio of interconnection (routing) area to the chip core area is only 12% which marks

a significant reduction from 60.545 in the conventional design [el]. The achieved

interconnection area percentage is even better (lower) than typical CNN-based optical

array processors, noting that cellular neural networks (CNNs) are known for their 'local'

connectedness that reduces routing problems and areas. For instance, a recent 2-D

programmable mixed-signal focal-plane array processot based on the CNN paradigm,

uses 35% of the total chip area for routing [27].

Table 6.1. Cornparison between a convenüonai and a di~tnbuted neuraî-based photosensor design

1 Optical A m y 1 5x5 1 8x8

1 Neural Network 1 25-4-3 1 64-84

1 Total No. of Synapses 1 I Die Area ( 3.45x3.05 = 10.5 mm2 ( 4.0~4.03 = 16.1 mm2

Core Area

1 Active Cell Area % 1 29.6% 1 85% 1 Interconnection Area 46 1 60.5% 1 12%

.- - - - -- - -

( Unused Silicon Area 1 - 10% 1 - 3%

6.5.5 Robushiess of Neuroas

A robust neuron characteristic is achieved for two reasons:

Normdized D, 1 1

1) Averaging effect: Lumped analog neurons implemented across a sizable die are subject

to major characteristic variations. In the distributed sensor chip, each neuron in the hidden

layer consists of n2 =64 distributed elements located on a 2-0 array, as each element is

a. This number also inchdes neurons' threshold (biiis) units that sire very sirnilai. to synaptic units.

Neural-bwd S m Photo- * . Duvrbutcd Ned-bPstd Sensor Adictcturr 102

contained in a smart pixel. A hidden neuron, therefore, takes an average of various

characteristics over the die surface. This effect makes al1 hidden neurons vimially

uniform. For output neurons the averaging effect takes place over a 1-D m y .

A measurement study showed that 'Iurnped' neurons 2500pm apart on silicon had a

maximum of 1.6% variation, while 'distributed' neurons contained in the same area

exhibited under 0.5% variation. The results were worst case among five fabrications

(more details can be found in Chapter 3). This property is especially important in a sizable

chip such as the present sensor which can be subject to large on-chip variations.

2) Fadt tolerance: Due to the fact that a neuron circuit is distributed among many sub-

blocks, there is a potential fault tolerance for neurons. A VLSI defect, e.g. an open circuit,

may only affect a fraction of a neuron instead of the whole.

6.5.6 BiCMOS vs. CMOS Implementation

As part of a sensor optimization study, the 8 x 8 distributed neural-based sensor with a

64-8-4 neural network has been implemented with simila. circuit blocks in a submicron

BiCMOS technology [51]. The chosen BiCMOS process potentially offers three

implementation advantages in our application: a) a smaller feature size (0.8pm vs. 1.2pm

CMOS); b) three metal layers vs. two layers in 1.2pm CMOS; c) true bipolar transistors

rather than parasitic BJTs of CMOS. The two first properties resulted in a denser sensor

realization in o smaller die area (10.6mm2 instead of 16. lmm2 in CMOS), while the third

property was used to implement photoreceptor cells based on true bipolar Darlington

transistors.

On the other hand, BiCMOS is a mon expensive process by nature, especially because of

the extra masks required for bipolar devices. The fabrication cost per area in 0.8pm

BiCMOS was three times that of in 1 . 2 ~ CMOS.' In practice, fabrication cost increase

1. Evaiuatibn is basal on f i cat ion cost of $800/mm2 for 0 . 8 ~ BiCMOS and $264/mm2 fot 1 . 2 ~ CMOS in 1997 (prices in Canadian Dollat).

University of WindsPr

was a offset by the reduction in sensor die area and the BiCMOS sensor turned out to be

twice more expensive han its CMOS counterpart. Submicron geometries and the

multiplicity of metal interconnection layers are, noneîheless, attractive features for dense

NNIC implementations that should be sought in advanced CMOS technologies.

Su brnicron CMOS technologies are in fact considerabl y less ex pensive than a similar

feature size BiCMOS. With the dirninishing trend in BiCMOS technology. an advanced

submicron CMOS process will be the naturai choice for future implementation of a

neural-based photosensor.

A design study in 0.35p.m or 0.25pm CMOS is proposed for future work. With an

optimum NNIC design and custom layout, neural-based photosensor chips with input

arrays as large as 12 x 12 to 16 x 16 are seen to be feasible in these technologies.

A training and recall simulator with multi-window and rnulti-tasking graphical user

interface (GUI) under XWew is developed especiaily for USN architecture.' Figure 6.8

shows some of the windows available in this simulator. From the main window (Figure

6.8(a)) the user can choose to define the network structure (Figure 6.8(c)), graphically

define the desired input-output patterns (Figure 6.8(d)). and based on hisher defined

parameters run a training session (Figure 6.8(e)) and finally a simulated recall (Figure

6.8(9). A modified back-propagation (BP) algorithm is used in which the property of a

distributed-neuron architecture is embedded, i.e. the saturating function of each neuron is

scaled, both in training and in recall phase, proportional to the number of its inputs. The

resulting weight set is rounded off to 5 (one sign and 4 magnitude) bits to match the

resolution of the hardware. A simulated recall has been included to ensure the

functionality of the network with quantized weights.

1. A simulator for standard BP was developed in our group [43]. Both the graphical interface and the incorporated algorithms are modified here for the special hardware architecture described in Section 65.

Figure 6.8 Various popup windows in hininglRecail simulator:

a) main window, b) about featwes, c) define structure (graphie window not shown),

d) defim inputloutput patterns, e) define and run training, f) simulated r-11

Table 6.2 shows a set of 9 inputioutput patterns used for the training of a neural-based

sensor with 8 x 8 optical inputs. The final outcome of traininglrecall simulator is a set of

weights to be prograrnmed on sensor chip. Weight programming is perfomed by a two-

phase non-overiapping clock. In practice, dock lines (@,, & und and the sequence

of weight vectors were generated by HP 75ûûû-D2O VXI-bus system, a software-

controlled digital test system also known as 020 Ester. An alternative test setup included

HP81 80 Data Generator and HP8182 Data ~ n a f ~ z e r . '

Table 6.2. 'kainhg pattem set for the 8 x 8 smart photosensor

1 Pattern 3 1 1 Pattern 9 1 Pattrrn 6

: Light ON

z : Light OFF

Pattern 2 1 Patbrn 5

6.7 Conclusion

Pattern 8

The design of a neural-based smart photosensor with focal-plane pattem classification for

an on-line process control is described. Photosensors are based on Field-Effect Modified

parasitic photoBST in a CMOS technology. Elernents of photosensor anay are distributed

over die surface and a neural-based smart pixels is fonmd around each sensor cell.

On-sensor neural classifier is based on a programmable hybrid architecture with unified

synapse-neurons that rely on distnbuteû neurons. nie proposed architecture has greatly

1. A CAD demanstraâon on this design was presentcd at TEXPOP% [26].

reduced interconnection areas and increased the synaptic density. Thus, the size of the

optical input array and the neural network classifier integrated in the available die area has

been increased. In addition, uniform and robust neuron characteristics are realized despite

fabrication process variations over the surface of a sizable die. Judged by the great

modularity and uniformity of its fabricated elements, the proposed architecture is a good

candidate for Wofer Scale Integration (WSI) of neural networks and neural-network-based

srnart sensors.

Chapter 7 Conclusions

7.1 Summary

In this thesis after studying various methods for the implementation

of neunl networks. it was decided that a hybnd analog-digital

approach for a fully-parallel VLSI implementation should be

explored. A robust hybrid architecture was developed based on

unified synapse-neurons (USN) that implements a fully-connected

multilayer neural network with regular arrays of a universai

building block. The universal block was a digitally programmable

USN comprised of an MDAC, a sigrnoidal sub-neuron and a

built-in weight register. Circuit design, implementation and

characterization were performed in a standard CMOS process both

for 5-V and 3 . 3 4 supply voltages.

The salient features of the proposed VLSI architecture are: high

modularity, a fully-parailel single-chip implementation, silicon area

efficiency due to reduced interconnection and inter-block areas,

self-scaling property of sigrnoidal neurons, quantization noise

improvement, uniforrn neuron hinctions due to an averaging effect.

an increased fault tolerance, automatic fan-out increase of USN

blocks, and digital prograrnmabüity. A special optoelectronic

version of the architecture relying on a 2 4 distributed array of

neural-based smart pixels was presented for the implementation of a photosensor with

focal-plane pattern classifier. Photosensitive elements were based on a Field-Effect

Modified vertical photoBlT in a standard CMOS technology.

Four chips were fabricated and tested during the course of this project: a chip containing

USN blocks and neural network test circuits (Chapter 3), an optical ielectronic template

matching network (Chapter 4), a 16-4-3 general purpose vector classifier NNIC (Chapter

4) and a programmable srnari photosensor integrating an 8 x 8 photosensor m y and a

64-8-4 neural network classifier (Chapter 6).

7.2 Contributions

A robust smart non-contact optical sensor based on a VLSl implementation of neural

network with an integrated photosensitive array and programmable digital weights has

been designed, realized and programmed. Optical pattems on an 8 x 8 array are mapped to

form process control vectors based on four analog neuron outputs by training the network.

The sensor was designed to detect low resolution fringe patterns resulting from

illuminating an object with coherent light. In this manner a small number of pixels can be

used to generate spatial precision control information based on diffraction pattems for use

in a flexible manufacturing cell.

In line with the achievement of the above objective, several contributions made in this

thesis ranging from the architecture to novel circuits, to the properties explored

theoretically and experimentally, can be summarized as follows:

A hybrid distnbuted-neuron architecture was presented and proved with fully

functional ICs. A new hybrid alternative to a fully-analog approach in [Ml, the

presented architecture features new properties and circuit naiizations. The architecture

was described in Chapter 2 and in [21].

Quantization noise improvement is an emerging property exclusive to a 'hybnd'

distributed-neuron implementation consisting of digital weights and analog neurons.

The reduction in output noise (weight quantization error) to signal ratio is a

consequence of self-scaling property. The first stochastic model for a distributed

neuron was presented in this thesis (see Chapter 5 and [20]). The stochastic model is an

extension of a model by Piché [68], 1691 for a conventional (lumped) neuron. The tirst

conclusion of Piché [69] is that increasing the number of nodes per layer in a

(conventionai) Maddine increases the required weight accuracy assuming a minimum

acceptable signal-to-noise ratio (SNR). In this thesis, it was shown that a distributed-

neuron architecture is advantageous in tenns of mûintaining a better signal-to-noise

ratio as the number of nodes per layer (or neuron inputs) increases. The larger the

network becomes, the more apparent the SNR advantage is, cornpared to a conventional

Maddine network.

An interesting self-scaling property of a distributed neuron was described intuitively

and demonstrated analyticdly and experimentdly (see Chapter 3 and Chapter 5).

Besides contributing to improving quantization effects, this property circumvents a

neuron re-design in the implementation of networks with various sizes.

The averaging propeny of distributed neurons against process variations, especidly an

infamous threshold voltage mismatch in MOS transistors, was analyzed and

demonstrated with fabrication measurements (see Chapter 3). This property that

creates virtually uniform neuron characteristics i s an important contribution in

addressing the problem of analog circuit variations, especially in large networks.

Simultaneously improving quantization noise effects and averaging out analog

variations, the presented architecture proves to be capable of reducing two main types

of implementation erron in digital and in analog domains, respcctively [19].

The presented neural network architecture nsults in a fully-parallel Nlyconnected

singleship implementation, as opposed to multiplex architectures [63],[96], partially-

connected schernes 1421, or chip-set solutions [48].

Conclusioii~ Caatribitio~ 110

A novel and compact nsistive-type neuron circuit based on quacûatic operation of

NMOS and PMOS transistors was presented (see Chapter 3 and [15]). Even for a

lumped implementation, circuit analyses, simulations and measurements al1 indicated

interestingly low characteristic variations for the neuron circuit compared to those in

amplifier-type neurons (e.g. [71], [35]). The maximum variation in 10 chips was 2.245

while the worst-case variation within one chip was 1.3%. Moreover, a distributed

implementation of the proposed neuron circuit revealed a maximum measured variation

of oniy 0.5%.

A programmable nonlinearly-loaded Multiplying DAC was presented as a new

universal circuit block for the implementation of NNICs (see Chapter 4,1241 and [18]).

As for the MDAC, al1 three sub-blocks were modified compared to [57], and hence the

experirnental characteristics were improved. Circuit techniques especially increased

the dynamic range of synaptic function and made possible an operation on 3.3V. as well

as SV supply. The low-voltage operation on 3.3V reduced the power consumption by

86%, compared to standard SV operation.

A novel design for a CMOS-compatible smart photosensor with focal-plane pattern

classification was presented in Chapter 6 and in [16]. A programmable neural-based

smart pixel with distributed neurons was the building block of the sensor m y .

Incremental improvements were made as explained in Section 6.3.2 and Section 6.5.3

on a Field-Effect Modified photoBJT and on a photosensor array built with this device

in a standard CMOS technology.

An important improvement demonstrated in the context of the neural-based

photosensor was the great reduction in interconnection areas and a corresponding

increase in 'synaptic density'. In practice, the proposed architecture slashed the area

for routing from 6û% to 124, and increased the synaptic density by a factor of 2.7

compared to a custornized conventional implementation in the same technology (see

Section 6.5.4). As a result, a larger photosensor acray and a larger neural network

classifier were implemented on a restricted silicon die arca.

7.3 Suggested Future Research

Fault tolerance is an issue that can be hirther investigated. An improved fault tolerance

in a distributed-neuron architecture compared to a conventional neural network

implementation was discussed and intuitively undeatood in Chapter 3. Moreover, in

Chapter 1 it was argued that a time-multiplexed neural network architecture (e.g. 1631,

[96]) had a lower degree of hardware redundancy and fault tolerance. Future research

can further explore the three architectures, namel y a time-multiplexed, a conventional

and a distnbuted-neuron architecture, in order to provide measures of their reliability in

various faulty conditions. Different types of faulty conditions. e.g. VLSI defects (open

/short), or burst noise at different quantities can be introduced to tnined neural network

classifiers in order to compare their recall performances.

Quantization noise improvement in a hybrid distributed-ncuron architecture implies the

possibility of a reduction in weight precision of recdl hardware. As the number of

neuron inputs (or nodes per layer) increases, there is a relative gain in signal-to-noise

ratio (Sm) of a distributed-neuron network compared to a conventional one (see

Section 5.4 and Section 5.5). When a minimum SNR level is set as the criteria, then at

some point the relative gain can be traded off with a lower number of bits in weight

quantization. Estimates show that depending on network topology each 5- 10 dB

difference in SNR is equivalent to 1 bit difference in weight precision. For instance,

conditions could be found under which a hybrid distributed implementation with &bit

digitized weights perfoms just as satisfactory as a conventional implementation with

5-bit weights. If the number of bits in our 5-bit programmable universal building block

can be reduced to 4, the MDAC circuit will be nearly halved in area and the size of the

weight register will be made smaller by 20%. This situation will nsult in a denser

synaptic implementation with a lower power consumption.

The most demanding implementation in this thesis was that of the neural-based

photosensor chip. In order to maintain a consistent base of cornparison with an earlier

conventional implcmentation of this sensor and to highlight the architectural (not

technological) improvements only, it was decided to make the new distributed

implementation in the same ( 1 . 2 ~ CMOS) technology as for the conventional one

before. Later a BiCMOS version of the sensor was implemented based on the same

distnbuted architecture and circuits to demonstrate the technology-related

improvements [SI]. The 0.8pm BiCMOS implementation was denser but tumed out to

be twice more expensive (see a discussion in Section 6.5.6).

As a result, an implementation of the presented sensor architecture in an advanced

submicron CMOS process is suggested for future work. Submicron feature sizes and

the multiplicity of metal interconnect layers are attractive properties of advanced

CMOS processes for dense neural network implementations. Nonetheless, a submicron

CMOS process is considerably less expensive than a similar feature size BiCMOS

process.

Currently, a triple-metal 0 . 3 5 ~ CMOS process is readily accessible from TSMC' and

the availability of a 5-metai 0.25pm CMOS process from the same foundry is

imminent. Initial studies indicate that (13 photosensor elements based on vertical BJTs

and its modifications are realizable in N-well0.35pm CMOS process2, and (ii) with a

fullsustom layout, a programmable neural-based photosensor chip with an integrated

photosensor array as large as 12 x 12 in 0.35pm and 16 x 16 in 0.25pm, dong with a

corresponding size multilayer neural network classifier should be feasible.

A neural-based photosensor implementation in 0.25pm CMOS process is highly

recommended, as the availability of five metal layen for interconnections in this

technology would be an asset for a fuilyconnected neural network integrated circuit.

In addition, a 2.SV supply operation in this process can be utilized towards a low-power

neural network irnplementation.

1 . Taiwan Semiconductor Manu facturing Company (fabrication services provided through the Canadian Microelecttonics Corporation - CMC).

2. A photoBJT cet1 bas been testeâ in 0 . 3 5 ~ CUOS* The results can not k reportcd hem due to non- disclosure agreements.

Conclusions SugOcstcd Fume Rcsauch 113

References

P.E. Allen and D.R. Holberg, CMOS Analog Circuit Design. New York: Holt Rinehart and Winston, 1987.

A. Aslarn-Siddiqi, W. Brockherde and B.J. Hosticka, "A 16 x 16 Nonvolatile Programmable Andog Vector-Matrix Multiplier," IEEE Journal of Solid-State Circuits, Vol. 33, No. 10, pp. 1502- 1509, October 1998.

L.E. Atlas and Y. Susuki, "Digital Systems for Artificial Neural Networks," IEEE Circuits and Devices Magazine, Vol. 5, No. 6, pp. 20-24, September 1989.

G. Bloch, F. Sirou, V. Eustache and P. Fatrez, "Neural Intelligent Control for a Steel Plant," lEEE Transactions on Neural Networks, Vol. 8, No. 4, pp. 910-918. July 1997.

B.E. Boser, E. Sackinger, S. Bromley, Y. LeCunn, RE. Howard and L.D. Jackel, "An Analog Neural Network Processor and Its Application to High-speed Character Recognition," Proceedings of International Joint Conference on Neural Networks (IJCNN), Vol. 1, pp. 4 15-420, Seattle, July 199 1.

J. Cao, M. Shridhar, M. Ahmadi and G.A. Jullien, "Recognition of Handwritten Numerals with Multiple Feature and Multistage Classifier," Journal of Pattern Recognition, Vol. 28, No. 2, pp. 153-163, 1995.

I. Cao, M. Shridhar, M. Ahmadi and GA. Jullien, "VLSI Implementation for Real- Time Extraction of Direction Vectors €rom Binary Images," Proceedings of 36th Midwest Symposium on Circuits and Systems, Vol. 2, pp. 963-966, Detroit, MI, August 1993.

G.A. Carpenier and S. Grossùerg, "Neural Networks: Introduction to the 1 December 1987 Issue of Applied Opiics," Special Issue of Applied Optics, Vol. 26, pp. 4909, 1987.

A. Chandna, G.A. Jullien and W.C. Miller, "Opto-Programmable Neural Networks: An initial Study," Proceedings of Canadian Conference on VLSI (CCVLSI), pp. 41- 48, Vancouver, BC, October 1989.

C.P. Chew, R.W. Newcomb and J.D. Yuh, "VLSI Circuits for Optoelectronic Neural Network Weight Setting," Proceedings of 36th Midwest Symposium on Circuits and S ystems, pp. 75 1-754. Detroit, MI, Aupst 1993.

L.I. Davis, Jr., GY. Puskorius, F. Yuan and L.A. Feldkamp, "Neural Network Modeling and Control of an Anti-lock Brake System," Proceedings of Intelligent Vehicle'92 Symposium, pp. 179-184, Detroit, MI. 1992.

L. Del Pup, N. Bewtra, R. Grondin, G.A. Jullien and W.C. Miller. "An Optically Coupled Neural Network for Process Control," Proceedings of Canadian Conference on VLSI (CCVLSI), pp. 4.2.1- 4.2.7, Ottawa, ON, October 1990.

J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. lackel and J. Hopfield, "Large Automatic Leaming, Rule Extraction. and Generaiization," Complex Systems, Vol. 1. pp. 877-922, 1987.

B.K. Dolenko and H.C. Card, 'Tolcrance to Analog Hardware of On-Chip Leaming in Backpropagation Networks," IEEE Transactions on Neural Networks, Vol. 6, No. 5, pp. KM- W 2 , September 1995.

H. Djahanshahi, M. Ahmadi, G.A. Jullien and W.C. Miller. "A Low-Vanation Nonlinear Neuron Circuit," (accepted in) Journal of Circuits, Systems and Cornputers, 1999.

H. Djahanshahi, M. Ahmadi, G.A. Jullien and W.C. Miller, "A Robust Hybnd Neural Architecture for An industrial Sensor Application," Proceedings of EEE International Symposium on Circuits and Systems (ISCAS), Vol. DI, pp. 41-45, Monterey, CA, May 3 1 -Sune 3. 1998.

H. Djahanshahi, M. Ahmadi, G.A. Jullien and W.C. Miller, "Neural Network Integrated Circuits with Single-block Mixed-signal Anays," Proceedings of 3 1st Asilomar Confennce on Signais, Systems & Computers, Vol. 2, pp. 1130-1 135, Pacific Grove, CA, November 1997.

H. Djahanshahi, M. Ahmadi, G.A. Jullien and W.C. Miller, "Neural Network lntegratcd Circuits with Single-block Mixed-signal Arrays," (submitted CO special issue of) Journal of Circuits, Systems and Computers, June 1999.

H. Djahanshahi, M. Ahmadi, G.A. Jullien and W.C. Miller, "A Self-scaling Neurai Hardware Structure That Reduces the Effect of Some impkmentation Errors:' Neural Networks for Signal Rocessing W - Roceedings of the 1997 IEEE Workshop (NNSP'97), pp. 588-597, Amelia Island, Florida, September 1997.

Univcisity of Windsor

H. Djahanshahi, M. Ahmadi, G.A. Juilien and W.C. Miller, "Quantization Noise Improvement in a Distributed-neuron Architecture," Proceedings of 40th Midwest Symposium on Circuits and Systems, Vol. 2, pp. 1282-1285, Sacramento, CA, August 1997.

H. Djahanshahi, M. Ahmadi, G.A. Jullien and W.C. Miller, "A Modular Architecture for Hybrid VLSI Neural Networks and its Application in a Smart Photosensor," Proceedings of IEEE International Conference on Neural Networks (ICNN), Vol. 2, pp. 868-873, Washington, D.C., June 1996.

H. Djahanshahi, M. Ahmadi. G.A. Iullien and W.C. Miller, "A Unified Synapse- Neuron Building Block for Hybrid VLSI Neural Networks," Proceedings of iEEE International Symposium on Circuits and Systems (ISCAS), Vol. 3. pp. 483-486, Atlanta, GA, May 1996.

H. Djahanshahi, G.A. lullien, W.C. Miller and M. Ahmadi, "Neural-based Smart CMOS Sensors for On-Line Pattern Classification Applications," (invited). Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Vol. 4, pp. 384-387, Atlanta, GA, May 1996.

H. Djahanshahi, M. Ahmadi, G A . Jullien and W.C. Miller, "Design and VLSI lmplementation of a Unified Synapse-Neuron Architecture," Proceedings of Sixth Great Lakes Symposium on VLSI (GLSVLSI), pp. 228-233, Iowa State University, Ames, Iowa, March 1996.

Hormoz Djahanshahi and Bart MacLean, "Mixed-Signal VLSI Neural Networks with Self-Scaling Neurons," Hardware Demonstration at TEXP0'97, Symposium on Microelectronics Research & Development in Canada (MR&DCAN), Ottawa, ON, June 1997.

Hormoz Djahanshahi, "Neural-based Smart Photosenson," CAD Demonstration at TEXP0'96. Symposium on Microelectronics Research & Development in Canada (MRBrDCAN), Ottawa, ON, kne 1996.

R. Dominguez-Castro, S. Espejo, A. Rodriguez-Vhquez, R. A. Carmona, P. Foldesy, A. Zarandy, P. Szolgay, T. Sziriinyi and T. Roska, "A 0.8-pm CMOS Two- Dimensional Programmable MixedSignal Focal-Plane A m y Processor with On- Chip Binary Imaging and Instruction Siorage," IEEE Journal of Solid-State Circuits, Vol. 32, NO. 7, pp. 1013-1025, July 1997.

J.R. Dorronsoro, F. Ginel, C. Sanchez and C. Santa Cruz, "Neural Fraud Detection in Credit Card ûperations," lEEE Transactions on Neural Networks, Vol. 8, No. 4, pp. 827-834, July 1997.

E.I. El-Masry, H.K. Yang and M.A. Ykout, "hplementations of Artificial Neural Networks Using Cunent-Mode pulse Width Modulation Technique," IEEE Transactions on Neural Networks, Vol. 8, No. 3, pp. 532-548, May 1997.

S. M. Fakhraie and K. C. Smith,Vï.SI-compati Implementations for Artijicial Neural Networks, Boston: Kluwer Acadernic hrblisher, 1997.

N. Farat, "Optoelectronic Neural Networks and Learning Machines," EEE Circuits and Devices Magazine, Vol. 5, No. 5, pp. 32-4 1, September 1989.

H.P. Graf et ai., "VLSI Irnplementation of Neural Network Memory with Several Hundreds of Neurons," N P Conference Proceedings, Snowbird, Utah. J.S. Denker, Ed., American lnstitute of Physics, New York, NY, pp. 182- 187, 1986.

H.P. Graf and LD. Jackel. "Analog Electronic Neurai Network Circuits," EEE Circuits and Devices Magazine, Vol. 5, No. 4, pp. 44-49, July 1989.

D. Hammeatorm, "A VLSI Architecture for high-performance Low-cost On-chip Leming," Proceedings of International Joint Conference on Neural Networks (UCNN), Vol. JJ, pp. 537-544, San Diego, CA, June 1990.

J.A. Hegt, "Hardware Implementations of Neural Networks," Proceedings of Measurement and Artificial Neural Networks, 'Themadag van de Werkgemeenschap Meten', Utrecht, November 1993.

W.D. Hills, n e Connection Machine, MIT Press, Cambridge, MA, 1985.

M. Holler, S. Tarn, H. Castro, R. Benson, "An Electrically Trainable Analog Neural Network (ETANN) with 10240 'Floating Gate' Synapses," Proceedings of International Joint Conference on Neural Networks (UCNN), pp. 19 1 - 196, Washington, D.C., June 1989.

J.J. Hopfield, "Neural networks and pliysical systems with emerging collective computational abilities," Proceeding of the National acaderny of Sciences, Vol. 79, pp. 2554-2558, 1982.

Y. Iida, E. Oba, K. Mabuchi, N. Nakamura and H. Miura, "A 1/4-Inch 330k Square Pixel Progressive Scan CMOS Active Pixel Image Sensor," IEEE Journal of Solid- State Circuits, pp 2042-2047, Vol. 32, No. 12, December 1997.

F.I. Kub, K.K. Moon, I.A. Mack and F.M. Long, "Programmable Analog Vector- Matrix Multipliers," IEEE Journal of Solid-State Circuits, Vol. 25. pp. 207-214, February 1990.

B. Lam, Design und Training of a Pmgrammable Fault-tulerunt Neural Network, M.A.Sc. Thesis. Universitv of Windsor. Canada. 1995.

1421 B. Lam, W.C. Miller and G.A. Jullien, "An Intelligent Optical Sensor," Proceedings of International Conference on Applications of Photonic Technology, Sensing, Signal Processing and Communications (ICAPT), Toronto, Canada, June 1994, in Applications of Photonic Technology, Ed., G. A. Lampropoulos, J. Chrostowski, R. M. Measures, Plenum Press, New York and London, pp. 14 1 - 144, 1995.

[43] K.W. Lei, "A 1 . 2 ~ Neural Network Design," M.A.Sc. Thesis, University of Windsor, Canada, 1994.

1441 K.W. Lei. G.A. Jullien, W.C. Miller, "A Programmable Intelligent Opticd Sensor Realization," Proceedings of 37th Midwest Symposium on Circuits and Systerns. Vol. 1, pp. 465-468, Lafayette, LA, August 1994.

[45] K.W. Lei, G.A. Jullien and W.C. Miller, "An Intelligent Opticd Sensor Realization," Proceedings of 36th Midwest Symposium on Circuits and Systems, Vol. 2, pp. 1284- 1287, Detroit, MI, 1993.

[46] G. Liang, CMOS Opto-Electronics Implementation and Application, M.A.Sc. Thesis, University of Windsor, Canada, 1993.

[47] G. Liang and W.C. Miller, "A Novel Photo BIT Array for Intelligent Imaging," Proceedings of 36th Midwest Symposium on Circuits and Systems, pp. 1056-1059, Detroit. MI, August 1993.

[48] B. Linares-Barranco, E. Shchez-Sinencio, A. Rodriguez-Vhquez and J.L. Huertas, "A Modular T-Mode Design Approach for Analog Neural Network Hardware Implementations," IEEE Journal of Solid-State Circuits, Vol. 27, No. 5, pp. 701-7 12, May 1992.

1491 R.P. Lippmann, "An Introduction to Computing with Neural Nets," IEEE ASSP Magazine, Vol. 27, No. 1 1, pp. 4-22, April 1987.

[SOI R.P. Lippmann, "Pattern Classification Using Neural Networks," EEE Communications Magazine, Vol. 27, No. 1 I , pp. 47-64, November 1989.

1511 B. MacLean, H. Djahanshahi, M. Ahmadi, G.A. Jullien and W.C. Miller, "A BiCMOS VLSI Implementation of an Intelligent Sensor," Roceedings of 40th Midwest Symposium on Circuits and Systems, Vol. 2, pp 1065-1068, Sacramento, CA, August 1997.

1521 N. Manduit, M. Duranton, J. Gobert, J.A. Sirat, "Lneum 1.0: A Piece of Hardware LEGO for Building Neural Network Systems," IEEE Transactions on Neural Networks, Vol. 3, pp. 414-421, May 1992.

1531 C.A. Mead, Analog VLSI and Neural Systems, MA: Addison-Wesley, 1989.

1541 S. Mendis, S. E. Kemeny and E. Fossum, "CMOS Active Pixel Image Sensor," IEEE Transactions on Electron Devices, Vol. 41, No. 3, pp. 452-453, 1994.

[55] Meta-Software, HSPICE User's Manual: Elements and Device Models (Volume II), Version 96.1 for HSPICE Release 96.1. February 1996.

[56] G. Moon, M.E. Zaghloul and R.W. Newcomb, VLSI Implementation of Synaptic Weighting and Summing in Pulse Coded Neural-vp Cells," IEEE Transactions on Neural Networks, Vol. 3, No. 3, pp. 394-403, May 1992.

1571 A. Moopen, T. Duong and A.P. Takoor, "Digitd-Analog Hybrid Synapse Chips for Electronic Neural Networks," in Advances in Neural Information Processing Systems, Vol. 2, pp. 769-776, 1990.

[58] J.M. Moreno. F. Castillo, J. Cabestany, J. Madrenas and A. Napieralski. "An Analog Systolic Neural Processing Architecture," IEEE Micro Magazine, pp. 51-59, June 1994.

[59] A.F. Murray and A.V.W. Smith, bbAsynchronous Arithmetic for VLSl Neural Systems," Electronics Letters. Vol. 23, No. 12, pp. 642-643, June 1987.

[60] A.F. Murray, D. Del Corso and L. Tarassenko, "Pulse-strem VLSI Neural Networks Mixing Analog and Digital Techniques," IEEE Transactions on Neural Networks, Vol. 2, pp. 193-203, Much 199 1.

[61] A.F. Murray and L. Tarassenko, Analog Neural VLSI: Pulse Stream Appmach, Chapman and Hall, London, U.K. 1994.

[62] R.A. Nordstrom, I.D. Meindl, "The Field-Effect Modified Transistor: A High- Responsivity Phototransistor," IEEE Transactions on Electron Devices, Vo1.s~-78, No. 5, pp. 41 1-4 16, October 1972.

[63] A. Nosratinia, M. Ahmadi, M. Shridhar and G.A. lullien, "A Hybrid Architecture for Feed-forward Multi-layer Neural Networks," Roceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Vol. 3, pp. 1541-1544, San Diego, CA, May 1992.

[64] A. Nosratinia, N. Yazdi, M. Ahmadi and M. Shridhar, "A Family of Hybrid Neural Networks," Roceedings of Midwest Symposium on Circuits and Systems, Vol. 1, pp. 469-472, Lafayette, Louisiana, Aupst 1994.

[65] T. Ong, P.K. Ko and C. Hu, "The EEPROM as an Analog Memory Device," IEEE Transactions on Electron Devices, Vol. 36, pp. t84Wl841, September 1989.

[66] I.M.C. Oosse, H.C.A.M. Withagen, I.A. Hegt, "Analog VLSI implementation of a fecd-fornard neural network," Roceedings of IEEE International Conference on Electronics. Circuits and Systems (ICECS), Cairo, Egypt, December 1994.

1671 M.L. Padgett, O. Erten, F.M. Salam, "Neural Networks and Computing: Practical Applications," Proceedings of IEEE International Conference on Neural Networks (ICNN), Plenary, Panel and Special Sessions, pp. 23-27, Washington, D.C., June 1996.

[68] S. W. Piché, 'The Selection of Weight Accuracies for Madalines," IEEE Transactions on Neural Networks. Vol. 6, No. 2, March 1995, pp. 432-445.

[69] Stephen Piché, Selection of Weight Accurucies for Neural Networkî, Ph.D dissertation, Stanford University, 1992.

[70] GY. hiskorius, L.A. Feldkamp and L.I. Davis, Jr., "Dynamk Neural Network Methods Applied to On-Vehicle Idle Speed Control," Proceedings of IEEE International Conference on Neural Networks (ICNN), Plenary, Panel and Special Sessions, pp. 238-243, Washington, D.C., kne 1996.

[7 11 R.D. Reed and R.L. Geiger, "A Multiple-Input OTA Circuit for Neural Networks," IEEE Transactions on Circuits and Systems, Vol. 36, No. 5. pp. 767-769, May 1989.

[72] E.A. Rietman, R.C. Frye, C.C. Wong and C.D. Komfeld, "Amorphous Silicon Photoconductive Arrays for Artificial Neural Networks," Applied Optics, Vol. 28, No. 15, pp. 3474-3478, August 1989.

[73] N. Rochester et al., "Tests on a ce11 assembly theory of the action of the brain, using a large digital cornputer," [RE Transaction on Information Theory, IT-2, pp. 80-93, 1956.

[74] F. Rosenblatt, "The perceptron: A Prob~bilistic Mode1 for Information Stonge Organization in the brain," Psych. Rev. 65, pp. 386-408, 1958.

[75] V. Ruiz de Angulo and C. Torras, "Self-Calibration of a Space Robot," EEE Transactions on Neural Networks, Vol. 8, No. 4, pp. 951-963, July L997.

[76] E. Sackinger, B.E. Boser, J. Bromley, Y. LeCun and L.D. Jackel, "Application of the ANNA Neural Network Chip to High-Speed Character Recognition," IEEE Transactions on Neural Networks, Vol. 3, No. 3, pp. 498-505, May 1992.

1771 F* M. Salam, "A Neuro-Chip for Real-time Leaming, Processing and Control," Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Vol. m, pp. 54-57, Monterey, CA, May 3 1-June 3, 1998.

[78) F. M. Salam, H. J. Oh, "Real-time Tracking Control using Modular Neural Chips with Onchip Leaming," Proceedings of IEEE International Conference on Neural Networks (ICNN), Vol. 2, pp. 9 14-9 19, Washington, D.C., June 1996.

R.W. Sandage and J.A. Connelly, "Producing Photo-transistors in a Standard Digital CMOS Technology," Proceedings of International Symposium on Circuits and Systems (ISCAS), pp. 369-372, May 1996.

S. Satyanarayana, Analog VLSI Implementation of Reconfigurable Neural Networks, P h 9 dissertation, Columbia University, 199 1.

S. Satyanarayana, Y. Tsividis and H.P. Graf, "A Reconfigurable VLSI Neural Network," IEEE Journal of Solid-State Circuits, Vol. 27, No. 1, pp. 67-8 1, January 1992.

C.-K. Sin, A. Kramer, V. Hu, R.R. Chu and P.K. Ko, "EEPROM as an Analog Storage Device, with Particular Applications in Neurai Networks," EEE Transactions on Electron Devices, Vol. 39, pp. 14 10- 14 19, June 1992.

M.A. Sivilotti, M.R. Emwerling and C.A. Mead, "VLSI Architectures for hplementation of Neural Networks," AIP Conference Proceedings, Snowbird, Utah, J.S. Denker, Ed., American Institute of Physics, New York, NY, pp. 408-4 13, 1986.

R.G. Steams, 'Trainable Optically-programmed Neural Network," Applied Optics, Vol, 3 1, No. 29, pp. 6230-6239, October 1992.

A.P. Takoor, A. Moopen, H. Langenbacher and S.K. Khana, "Programmable Synaptic Chip for Electronic Neurd Networks," Neurai Information Processing Systems, D.Z. Anderson, Ed., Denver, CO, Amedcan Institute of Physics, pp. 564- 572, 1988.

E. van Keulen, S. Colak, H. Withhagen and H. Hegt, "Neural Network Hardware Performance Criteria," Proceedings of IEEE International Conference on Neural Networks (ICNN), pp. 1885-1 888, June 28-July 2, 1994.

J. von Neumann, The Computer and the Brain, New Haven: Yale University Press, 1958.

Eric Vittoz, "Analog VLSI foi Collective Computation," Proceedings of IEEE International Conference on Electronics, Circuits and Systems (ICECS), Vol. 2, pp. 3-6, Lisbon, Portugal, September 1998.

Eric Vittoz, "Analog VLSI Implementation of Neural Networks," in Hundbook of Neural Coniputation, Institute of Physics Pu blishing and Oxford University Press, USA, 1996.

Eric Vittoz, ''Micropower Techniques," in Design of VLSI Circuits for Telecommicnications und Signal Pmcesshg, Eâitors J Franca and Y. Tsividis, Prentice Hall, Englewood Cliffs, 1994.

[9 11 K. Wagner and D. Psaltis, "Optical Neural Networks: An Introduction by the Feature Editors," Special Issue of Applied Optics, Vol. 32, No. 8, pp. L26 1-1263, March 1993.

[92] H.C.A.M. Withagen, "Reducing the Effect of Quantization by Weight Scaling," Proceedings of IEEE International Conference on Neural Networks (ICNN), pp. 2 128-2 130, June 28- July 2, 1994.

[93] Y. Xie and M.A. Jabri. "Analysis of the effects of quantization in multilayer neural networks using a statistical model," IEEE Transactions on Neural Networks, Vol. 3. No. 2, pp. 334-338, March 1992.

[94] 0. Yadid-Pecht, et al., "A Random Access Photo-diode Amy for Intelligent Image Capture," IEEE Transactions Electron Devices, Vol. 38, No. 8, pp. 17724780. August 199 1.

[95] A.K. Yamamura, Neural Network C o n t d and an Optoelectmnic lmplernentation of a Multilayer Feedfoward Neural Nehvork, Ph.D Dissertation, California Institute of Technology, 1992.

[96] N. Yazdi, M. Ahmadi, G.A. Jullien and M. Shridhar, "Pipelined Analog Multilayer Feedforward Neural Networks," Proceedings of EEE International Symposium on Circuits and Systems (ISCAS), Vol. 4, pp. 2768-2771, Chicago, IL, May 1993.

[97] N. Yazdi, M. Ahmadi, G.A. Jullien and M. Shridhar, "A High-Dynamic Range CMOS Buffet Amplifier with High-Drive Capability," Proceedings of EEE International Symposium on Circuits and Systems (ISCAS), Vol. 5, pp. 2332-2335, San Diego, CA, May 1992.

[98] J.M. Zurada, Introduction to Artijicial Neural Systems, West Pub lis hing Company, 1992.

Appendix VLSZ Layouts und Fabrications

Figure A.1 Layout of a sub-neuron circuit

Figure A 3 A gmnp of Bve sub-neurons nith a common bias circuit

Figure A.3 ' h o layouts for a 5-bit MDAC synapse with cascode transistors

Figure A.4 Layout of a 5-bit non-cascode MDAC synapse for 33V operation

FipreA.5 Layout of a UnWeà Synapse Neumn (USN): a) with cascode MDAC, b) with non-cascode MDAC for 33V operation

Figure A.6 Layout of a 5-bit pamlfel-in paralld-out (Pm) weight register

Figure A.7 WRRNR a test ehip containing distributcd neumns, MDACs and USNs (see page 40 for a microphotograph)

Figure A.8 WRNBS: 4-input template matcbing NNIC with optidlelectronic inputs (see page 67 for a dcrophotograph)

FigureA.11 WRNSS: I (see page 1

Figure A.12 Neural-based ph

Wta AuctoBs

Hormoz Djahanshahi was bom in 1964 in Tehran, Iran where he obtained his high school

diploma at the age of 16. He received B.Sc. degree (Hons.) and M.Sc. degree (Hons.)

from Tehran Polytechnic (Amir Kabir) University both in Electncal Engineering with a

major in Electronics. His Master's project, a Patient Monitoring System, was an applied

research in biomedical instrumentations. He punued his Master's work at Fajr

Microelectronics Co. when the system evolved from an engineering prototype to a

commercial product. His PhD thesis at the VLSI Research Group, University of Windsor,

Canada was in the area of VLSI implementation of hy bnd (analog-digital) neural networks

and smart optical sensors. The research lead to several conference and journal articles.

Towards the end of his thesis, he has been working as a Post Doctoral Fellow at the VLSI

Research Group, University of Toronto, where he has designed and published in the area

of high-speed (622MHz) Clock & Data Recovery, and Giga bit per second UO interface

circuits, both in low-voltage submicron CMOS.

Robust Hybrid VLSI Neural Network Architecture for Smart

Documents

A VLSI Architecture for Neural Network Chips...This thesis reports the research for the development of a neural network VLSI design environment where a neural application defined in

VLSI - COMPATIBLE IMPLEMENTATIONS FOR ARTIFICIAL NEURAL ...978-1-4615-6311-2/1.pdf · ANALOG VLSI IMPLEMENTATION OF NEURAL NETWORKS, ... 2.5.6 Sub-Threshold Neural-Network Designs

A Reconfigurable Analog VLSI Neural Network Chip · A Reconfigurable Analog VLSI Neural Network Chip Srinagesh Satyanarayana and Yannis Tsividis Department of Electrical Engineering

Vlsi in Neural

EE 5900 Advanced Algorithms for Robust VLSI CAD , Spring 2009

VLSI Implementation of Artificial Neural Network

Analog VLSI Implementation of Neural Network Architecture ...aircconline.com/vlsics/V3N2/3212vlsics20.pdf · International Journal of VLSI ... the implementation of Neural Network

HYDRA: Pruning Adversarially Robust Neural Networks...HYDRA: Pruning Adversarially Robust Neural Networks Vikash Sehwag

Real-Time Autonomous Robot Navigation Using VLSI Neural ...papers.nips.cc/paper/414-real-time-autonomous-robot-navigation... · real time robot navigation system based on VLSI neural

Robust Direction Estimation with Convolutional Neural ...pertila/ICASSP2017talk.pdf · Robust Direction Estimation with Convolutional Neural Networks-based ... reverberation and everyday

VLSI Implementation of Neural Network - ctts.in vol-4 iss-3.pdf · VLSI Implementation of Neural Network ... VLSI implementation of artificial feed neural network (NN) wherein all

S -C NEURAL NETWORKS: TO WARDS E VLSI IMPLEMENTATION …

Implementing neural architectures using analog VLSI ...authors.library.caltech.edu/53026/1/00031311.pdf · Implementing Neural Architectures Using Analog VLSI Circuits ... the component

Design and Analog VLSI Implementation of Artificial Neural Network

Multiple Convolutional Neural Networks for Robust

VLSI IN NEURAL NETWORKS

A VLSI hamming artificial neural network with k-winner ... · PDF fileA VLSI Hamming Artificial Neural Network with k ... Abstract - A novel circuit-level Hamming artificial neural

Adaptive Analog VLSI and Neural Nets Thesis

ICEG Morphology Classification using an Analogue VLSI Neural Network€¦ · Analogue VLSI Neural Network Richard Coggins, Marwan Jabri, Barry Flower and Stephen Pickard Systems Engineering

VLSI Implementation of Deep Neural Network Using … · 1 VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing Arash Ardakani, Student Member, IEEE, François