View
7
Download
0
Category
Preview:
Citation preview
A Robust Hybrid VLSI Neural Network Architecture for a Smart Optical Sensor
by
Hormoz Djahanshahi
A Thesis
Submitted to the College of Graduate Studies and Research
in Partial Fulfillment of the Requirements for
the Degree of Doctor of Philosophy
Electrical and Computer Engineering
University of Windsor
Windsor, Ontario, Canada
1998
O 1998 Hormoz Djahanshahî
National Library B J c l .,ma Bibliothbque nationale du Canada
Acquisitions and Acquisitions et Bibliogrsphk Services rewices bibliogmphques 39s Wdlington Street 395, Ne Wellington ôtbwaON K1AON4 ôttawaON K 1 A W canada CPnada
The author has granted a non- exclusive licence aiiowing the National Library of Canada to reproduce, loan, distribute or sel1 copies of this thesis in microform, paper or electronic formats.
The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fiom it may be printed or othenuise reproduced without the author's permission.
L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/nlm, de reproduction sur papier ou sur format électronique.
L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse N des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
C a n a !
Dedicated to the memory of my grandmother
who was my first teacher
Abstract
This thesis introduces a novel approach to the design of circuits found in a v e r - large scale
integration (VLSI) implementation of an artificial neural network. A robust hybrid architecture
with analog and digital elements has ken developed for a fully-parallel single-chip realization of
multilayer neural networks. The proposed architecture is highly modular and creates regular
silicon structures that well suit a VLSI realization. The architecture employs an innovative
universal building block consisting of an improved digital-analog multiplier, a new analog active
nonlinear resistor and a digital weight register. The key circuit called a unified synapse-neuron
allows one to realize a self-scaling sigrnoidal neuron characteristic that does not have to be
constantly redesigned to accommodate a varying dynarnic input range that is dependent upon the
number of synaptic weights connected to the input of the neuron. The effects of synaptic weight
quantization noise are also shown to be reduced using a stochastic mode1 developed in the thesis.
A new resistive-type neuron circuit is presented that exnibits inherently low characteristic
variations based on analyses, simulations and fabrication measurements. Moreover, as each
neuron is realized by a number of compact sub-neurons that are distributed over the die area, the
effects of process variations on the neuron's characteristics are minimized due to the distributed
averaging effect that takes place. Increased robustness is achieved as there is a simultaneous
reduction of both digital quantization effects and analog variation effects. The distnbuted nature
of the analog neuron also has the potential to contribute to increased fault tolerance for certain
types of neuron circuit failure. Circuit design. implementation and characterization are performed
in a standard CMOS process at SV and 3.3V supply voltages so as to lead to an optimized design.
The purpose for this research was to develop a smart nonîontact optical sensor based on a
programmable neural network with an integrated photosensitive array. The theoretical and
experimental work has lead to the design and realization of a highly modular and robust neural-
based smart CMOS sensor with reduced interconnection areas and increased synaptic density.
As a result. a larger photosensor array and a larger neural network classifier are implemented on a
restncted die area. Both theoretical and experimentai results are presented in the thesis.
Acknowledgements
I would like to thank my thesis advisor, Dr. William C. Miller, for his encouragement and
support throughout this thesis. 1 also wish to express my gratitude to Dr. Majid Ahmadi
and Dr. Graham A. Jullien for their help and support in various aspects of this research.
1 am gateful to the members of my thesis committee Dr. Fathi M. Salam (extemal
examiner) and Dr. Subir Bandyopadhyay (external reader) for their insightful comments
and suggestions.
Special thanks goes to Roberto Muscedere for his help with computer and software
problems during the last year of my work at the VLSI Research Group in Windsor.
He made a diffennce in our lab. To my friends Hossain Hajimowlana, Saeid Sadeghi,
Ramin Safari. Marjan Shahkariuni, Jinming Yang and many others, thanks for your
invaluable friendship and for the enjoyable research environment you have contributed to.
Last but not the least, I would like to thank my wife, Taban, for al1 her love, support and
understanding. She is a hero! And to my daughter, Kirnia, thanks for bringing joy to our
life. You are my greatest rhievement.
Uniwnity of Windsor
Table of Contents
List of F i ~ m aaaaaaeamaaaaaaeaameaaaaaaeammaaoaaaaaaaaamaaamaaaaaaaaaaaaaaaaaaaaaaaaeaamaamaaaaaoaea x
Chapter 1 Introduction aamammaaamaeaeaaaam~aaamaeemaamaaaaeaaaaaamaamamaaaaaamaaaaaaaaeaaaaaama 1 1 . 1 Overview .......................................... 1
1.1.1 Conventional vs . Neurd Cornputation ............................................ 2 .............................................................. 1.2 Neural Network Implementations 5
..................................................................... . 1.2.1 Software vs Hardware 5 .................................. 1.2.2 Optical and Optoelectronic Implemen tations 7 . f ................................................................... 1.2.3 Digital Implementations 8
................................................................ 1.2.4 Analog Implementations 10 1.2.5 Hybrid (Mixed-signal) Implementations ....................................... 12
1.3 Objectives ................................................................................................... 14 1.4 Thesis Organization ............................................................................... 15
Chapter 2 A New Hybrid VLSI A ~ ~ h i t e ~ t ~ ~ a a e a o e a a a a a a m a a a a m a e a ~ ~ a ~ . a 18 ................................................................................................ 2.1 Introduction 18
2.2 Some implementation Problerns ................................................................ 19 ...................................... 2.3 A New Hybrid Distributed-Neuron Architecture 21
2.3.1 Creating a Unified Synapse-Neuron (USN) ................................. . . 1 2.3.2 A Modular Neural Neiwork Implementation ................................. 24
........................................................ 2.3.3 Propenies of the Architecture 25 2.4 Conclusion ................................................................................................. 29
Chapter 3 Distributed Neuron md its P r o p e r t i e ~ a a a . e m a e e . . a a m m m a a e m a a a a a a a a a m 30 3.1 Introduction ................................................................................................ 30
............................................................ 3.2 Nonlineai Resistive-type Neuron 31 3.2.1 Circuit Description ........................................................................ 31
............. 3.2.2 Analysis of 1-V Chatacteristics .................................. 32 3.2.3 A Sensitivity Study ........................................................................ 38 3.2.4 Fabrications and Measurements ......................... .......*............*..... .. 39
........................... 3.3 Implementation and Properties of a Distributed Neuron 43 3.3.1 AnAveragingEffect ...................................... .. 3.3.2 A Self-scaling Property .................. .......... .... ........... ............... -49 3.3.3 AnIncreasedFaultTolerance .............*. ......*..........,..*... .............. 5 1
3.4 Conclusion .................. ...... ....... ............................................................... 52
University of Windsor
.**..*....*..............*.***............ 4 1 Introduction -53 .................................. 4.2 A Programmable Universal Hy brid Building Block 54
....................................... 4.2.1 Multiplying D-to-A Converter (MDAC) 54 .................................... 4.2.2 Weight Register with Double-Phase Clock 61
4.2.3 Characteristics of the Unified Synapse-Neuron Circuit ................. 62 4.3 Applications ........................................
....................................... 4.3.1 An Optical Template Matching Network 64 4.3.2 General Purpose Programmable Neural Network Classifier ......... 69 4.3.3 Other NNIC Fabrications ............................................................. -71
4.4 Conclusion .............................................................................................. 71
Chapter 5 5.1 5.2
Chapter 6 6.1 6.2 6.3
Quantization Noise Improvement aaaaaaaaaemaaaa~aaaaamaaameaaaaaaaaaaaaa 72 Introduction ................................................................................................ 72 Modeling a Distributed Neuron ................................................................. 73 5.2.1 Increase in the number of Adaline inputs ................................... 74
................................................................ 5.2.2 Self-scaiing Formulation 77 Stochastic Mode! ..............* ......... ................................................................ 79 5.3.1 Sigrnoidal Adaline with Lumped Neuron ...................................... 79 5.3.2 Sigrnoidal Adaline with Distributed Neuron .................................. 81 A Case Study ........................................................................................... A 2 Discussion and Conclusion ........................................................................ 84
Neural-bawd Smart P ~ O ~ O S ~ ~ S O ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 86 ............................................................... ............................ Introduction ., 86
Objectives and Issues ................................................................................. 87 ............................................ Photosensor Array ................ ........ ..... ........... 88
6.3.1 CMOS-compatible Photosensitive Device .................................... .88 ........................ ............ 6.3.2 CMOS Photoreceptor Ce11 Circuit .. 3
A Review of Conventional Designs ........................................................... 92 6.4.1 Partially-connected Pm-programmed Neural-based Sensor .......... 93 6.4.2 Fully-connected Programmable Neural-based Sensor ................... 93
........................................... Distributed Neural-based Sensor Architecture 94 6.5.1 2-D Distributed-neuron Architecture ................... .. .................... -95
..................... .................. 6.5.2 Distributed Amy of Smart Pixels .... 96 6.5.3 Characteristics of the Neural-based Sensor Chip .......................... 99 6 S.4 S ynaptic Density and Interconnections ...................................... -99 6.5.5 Robustness of Neurons ......................... .. .............................. . 1 0 2
... 6.5.6 BiCMOS vs . CMOS Implementation ................................. 103 Training .................................................................................................. 104
......................*.....*...**.*....... .......*...............*...*.*...*..**..*.*..... Conclusion .... 106
vii
University of Windsor
Chapter 7 C O ~ C ~ U S ~ O ~ S 8 ~ 0 ~ ~ 8 0 8 0 8 8 8 ~ 8 8 8 8 8 8 0 0 8 ~ 8 ~ 0 ~ ~ 8 ~ ~ 0 8 ~ ~ 8 8 8 0 8 ~ ~ 0 0 0 0 ~ 8 ~ ~ ~ 0 ~ 8 8 8 8 ~ 0 0 108 Surnmary .. ......... .. . .. . ............. ..... .... . .... . ............. ...... ... .. .. .. . . ... ... . . . . .. .... . ... . .. 1 O8 Contri butions ... . .. . ... . .. . . . . . . . . . .. . . . .-. . . . . . . . . . .. . . .. . .. . . ....... . .... . .,, . . ., . .. 109 Suggested Future Research ...................................... . .. . . . ......... . 1 12
viii
University of Windsor
List of Tables
Table 3.1. Device sizes for the two neuron circuits shown in Figure 3.1 and simulated ...................................................................... ...................... in Figure 3.2 .. 33
Table 3.2. Regions of operation for the neuron circuit shown in Figure 3.I(b) ............... 35 .................... Table 3.3. A sumrnary of experimenta! results on (lumped) neuron circuit 42
Table 3.4. A summary of comparative measurements on lumped and distributed ......................................................................... ............................ neurons ....... 47
............ Table 4.1. Device sizes (Wn) in pm for MDAC circuit shown in Figure 4.2(c) 59
Table 4.2. A data summary about template-matching NNIC ....................................... 67
Table 4.3. A data surnmary about programmable NNIC classifier ................................ 70
Table 6.1. Cornparison between a conventional and a distributed neural-based photosensor design ....................................................................................... 102
..................................... Table 6.2. Training pattern set for the 8 x 8 smart photosensor 106
University of Windsor
Figure 1.1
Figure 2.1
Figure 2.2
Figure 2.3
Figure 3.1
Figure 3.2
Figure 3.3
Figure 3.4
Figure 3.5
Figure 3.6
Figure 3.7
Figure 3.8
Figure 3.9
List of Figures
Taxonomy of Artificial Neural Networks ......................... .. ...... . ................ 6 a) Interconnections in a fullyconnected multilayer neural network, b) limitations on VLSI interconnections ...................... .......................... . 20
Evolutionary steps to create a unified synapse-neuron in a hybrid architecture .. ...... .... .................................. . . . . ....... . . . 22
Modular neural network implementations: a) a 4-3 neural network, b) a multilayer 4-3-2 neural network ..................... 26
Circuit diagram of nonlinear Eto-V neurons: a) original circuit [8 11, b) the modified circuit .............. + ............ .... . . . 33
1-V characteristics of the circuits in Figure 3.1 (a) and (b) ........................... 33
a) Regions of operation on S-shaped V-1 characteristic, b) current distribution in four MOS transistors ........................................... 34
Simulations with NI% variations on threshold voltage .............................. 39
Microphotograph of a fabricated CMOS chip that includes lumped neurons, unified synapse-neurons and other test circuits ............................. 40
Measured neuron characteristics from 10 fabricated chips: a) overlaid results, b) a close-up view of maximum chip-to-chip variations 40
Implementation of a distributed neuron .......... ........ .... . ...... . ... . . . . . . 44
Neuron cells in a gradient of doping: a) lumped realization. b) distributed realization ........................................ . 45
Onchip variations of characteristics (worst case among 5 fabrications): a) lumped neuron implementation, b) distributed neuron implementation .. 48
Figure 3.10 Self-scaling property of the distributed neuron circuit: a) simulated characteristics of a sub-neuron and a 5-input neuron, b) experimental results comparing a 2-input and a 5-input neuron .............. 5 1
Figure 4.1 Sub-blocks of the universal hybrid building block ........................ ... .... ........ 54
Figure 4.2 MDAC-type synapse, evolutionary steps: a) a conceptual block diagram, b) a 5-bit version based on [57],
* . . . . . C) mdificat~on in sign-bit circuit, ..................................... ... ...................... 56
Figure 4.2 Continued d) modification in current mirrors, e) modification in V-to-1 converter... ...., 57
Figure 4.3 MDAC output current (at Vin.-) W. binary weights: a) simulation waveforms, b) fabrication meamrements ............ .............. ..... 58
Figure 4.4 Improving the dynamic range of but vs. Vin in MDAC: a) threshold reduction (circuit improvement h m Figure 4.2(c)
University of Windsor
to Figure 4.2(d)), b) linearization (circuit irnprovement from Figure 4.2(d) to Figure 4.2(e)). 60
Figure 4.5 Schematic of one bit of a 5-bit weight stonge cell with double-phase dock ............................................................................................................ 61
...................................................... Figure 4.6 Unified S ynapse-Neuron (USN) circuit 62
F i ~ r e 4.7 Output current of the modified MDAC (Figure 4.6 or Figure 4.2(e)) measured through a linear load ..................................................................... 63
Figure 4.8 Overall characteristics of USN: a) simulations for two parametrie values of Vin. ..... ............................................ .. b) expenmental measurements Vi SV 63
Figure 4.9 A 4-3-2 VLSI neural network based on arrays of a univenal hybrid building block ......................... .. .............................................................. 64
Figure 4.10 Template matching: a) optical inputs. b) equivalent electronic inputs . c) chip outputs in recall .............................................................................. 66
........ Figure 4.1 1 a) Layout and b) core microphotograph of template matching NNIC 67
Figure 4.12 a) Border feature extraction and b) directional templates for handwritten numenl recognition ............................................................................... 69
Figure 4.13 Microphotograph of a general purpose 16-4-3 programmable W I C classifier ............................................................................................. 70
Figure 5.1 An Adaline implemented with lumped resistive-type neuron ...................... 73
Figure 5.2 An Adaline with a distributed-neuron architecture .................................. 74
Figure 5.3 a) An N-input neuron characteristic over the original range of inputs. b) the same neuron when inputs are increased to N.S, c) a properly-scaled neuron with N.S inputs ................... .. ....................... 76
....................... Figure 5.4 Neuron input increase for a distributed neuron .... ...... 78
......... ............... Figure 5.5 Stochastic mode1 for an Addine with distributed neuron .. 82
. ...................... Figure 5.6 Signal-to-Noise Ratio improvement vs input increase factor 84
Figure 6.1 Top view. cross section and device equivalent model of: ... a) vertical photoB JT. b) Field-Effect Modified (FEM) vertical photoBJ T 9 1
Figure 6.2 Photosensor cell circuit .......................................................................... 92
................. Figure 6.3 Neural-based photosensor for focal-plane pattern classification 95
Figure 6.4 Hybrid distributed-neuron architecture on a 2-D array .................... ., ...... 96
Figure 6.5 Neud-based smart pixel: a) a schematic diagram. b) die micmphotograph of two adjacent pixels ..... 97
............... Figure 6.6 Floorplan of the neural-based photosensor chip .. ................. 101
Figure 6.7 Microphotograph of the neural-based photosensor chip ............................ 101
University of Windsor
Figure 6.8
Figure A . 1
Figure A.2
Figure A.3
Figure A.4
Figure AS
Figure A.6
Figure A.7
Figure A.8
Figure A.9
Various pop-up windows in Training IRecall simulator: a) main window. b) about features. c) define structure (graphic window not shown). d) define input/output patterns. e) define and run training. f) simulated recall ...................................................................................... 105
. . Layout of a sub-neuron circuit ................................................................. 123
........................... A group of five sub-neurons with a common bias circuit 123
.............. Two layouts for a 5-bit MDAC synapse with cascode transistors 124
........... Layout of a 5-bit non-cascode MDAC synapse for 3.3V operation 124
Layout of a Unified Synapse Neuron (USN): a) with cascode MDAC.
.*.......*................ ......... b) with non-cascode MDAC for 3.3V operation .. 125
Layout of a 5-bit parailel-in parallel-out (PPO) weight register ............... 125
WRNHT: a test chip containhg distributed neurons . MDACs and USNs (see page 40 for a microphotograph) ......................................................... 126
WRNBS: &input template matching NNIC with opticallelectronic .............................................. inputs (see page 67 for a microphotograph) 126
WRPNN: a generai purpose programmable vector classifier NMC (see page 70 for a microphotograph) ................ ........................... 127
Figure A . 10 Layout of a neutai-based smart pixel (in WRNSS) consisting of a photosensor cell. 8 USN blocks. 8 x Sbit weight registers. clock driver and bias (see page 97 for a microphotograph) ....................... 127
Figure A . 1 1 WRNSS: neural-based srnart photosensor chip .......................................... 128
Figure A . 12 Neural-based photosensor chip in a 68 Pin Grid Array package ................ 128
Chapter 1
Human beings have long k e n fascinated by the operation of
biological neural systems. Intelligent behaviors such as
understanding, reasoning, vocal communications, vision, decision
making and locomotion are al1 attnbuted to the nervous systern and
its complexity and fineness. Besides basic interest in the
understanding of the physiologicd phenornena. it is highly
desirable for many applications to realize features similar or close
to those performed by biological neural systems.
In recent years, much research has been put into deriving ideas
from biological paradigrns in order to make intelligent systems.
The objective of modem neural network research has been to
understand different aspects of the biological counterpart, as well as
to realize useful artificial neural systems. An actificial neural
network (ANN). or simply neural net (NN), is a processor whose
design is motivated by biological neural systems. Neurai networks
are charocterized by a massive number of interconnections and
nodes that collectively perform a parallel distributed processing
task based on a host of simple computational elements.
University of Windsor
Key properties of neural networks are their fault tolerance due to interconnection
ndundancy, and more importantly their ability to learn from examples and later recul1 and
generalize even in noisy conditions. Applications of neural networks are generdly in
areas that humm brain may outperform conventional computers. Examples are, vision,
speech recognition and synthesis, pattern classification. handwritten character recognition,
medical diagnosis and expert systems, trend prediction, intelligent control and robotics.
Neural network research is an interdisciplinary field that has evolved from the interaction
of several disciplines including neurobiology, physics, mathematics, psychology,
cornputer science and electrical engineering. In the l95O's, Rosenblatt introduced his
Perceptron mode] [74]; however, it was shown that the perceptron was incapable of
solving the parity (EXOR) problem. Presenting new understandings of existing ideas in
neural networks, J. Hopfield revived interest in this field in the early 1980's. Hopfield
showed that several interesting properties emerge from the collective behavior of neumns
and synapses [38]. The new intenst in neurd networks in the past ten years or so has been
sparked in part by new models and training dgorithms, as well as advances in
rnicroelectronics that allows hardware or software implernentation of fairly large
networks. Carver Mead [83], [53] and the othea [3]. [3 11, [32], [33], [85] demonstrated
that ANNs with reasonable size can be implemented on VLSI chips that show the
predicted collective behavior, and mimic some features of, biological neural networks.
Some ANN implementations have shed new light on understanding the operation of the
biological counterpart [53].
1.1.1 Conventional vs. Neural Computation
Most of the computers that are part of our everyday lives fdl into a category of digital
machines with the von Neumann architecture. They represent information as 1s and Os
and perfom computations using a central processing unit (CPU) connected to a memory
bank. In a von Neumann machine a CPU based on a pdefined algorithm accesses data in
memory, perforrns some operations on the data and stores the results back into memory.
With ever increasing complexity of the tasks that cornputen are required to perforrn,
University of Windsor
numerous techniques have k e n pursued to enhance their computational power. Major
improving methods are summacized below.
Terhnological Advances: There has ken a continuous trend in incremental
developrnent of both faster and more complex processors. and faster and larger
memories. The minimum feature size on a chip has ken shrinking while the number of
transistors per chip and the dock speed have been increasing steadily. However, we
will inevitably be approaching a point where physical limits on feature sizes, etc. will
be reached.
Miscellaneous Organizational Improvements: Some exarnples of organizational
improvements to speed up computation flow include pipelining architectures, Reduced
Instruction Set Cornputers (RISC machines), and hierarchical memories e.g. a pyramid
of built-in CPU registen, cache memory, dynamic rnemory bank and rnass storage.
Pardlel Pmessing: A major gain in computational power is achieved by adopting
rnultiprocessor architectures. In practice, an increase in the number of processors does
not necessarily result in a proportional increase in computing power. One issue in
parallel processing is how to divide a task into pieces that can be perfonned in parallel,
rather than serially. Other issues are how to organize memory resources and how to
communicate among an increased number of processors. A My-interconnected
communication scheme would result in an overly complicated hardware. Instead, in a
'hypercube' architecture, each processor placed at the corner of a hypercube is allowed
to comrnunicate directly only dong the cube edges. while in a 'systolic array', the
processors placed on a grid communicate only with the nearest neighbors on the array.
Commerciaily avdable multiprocessorl multicomputing systems are application
dependent and Vary from those like the Cray Y-MP with eight powerful processors to
the Connection Machine with 65K single-bit SIMD' processors [36].
- -
1- SIMD: Single Instruction Multiple Data
University of Windsor
A neural network is yet another alternative on the specinim of computational
architectures, one that looks closer to the Connection Machine, although it is different in
nature [95], 1491.
Progranunhg Algorithm: Proper programming is an important part of a problem-
solving computer system. As long as a problem and its solution are well understood
and modeled, it is always possible to develop an algorithm and a program to solve the
problem, no matter how complicated. However, there are iüsks that have not yet been
well undentood and implemented on computers, among those are problems for which
good algorithmic solutions are not known. Pattern recognition and machine vision are
examples of such sophisticated problems. As such problems are routinely solved in
nature by some simple life fonns, let alone human beings, many researchers seek a
solution in neurobiology. They h o p to mode1 their understanding of biological neural
computation toward a non-algorithmic solution for the mentioned types of problems.
Neural networks represent an alternative paradigm that lies at one end of a computational
spectnim with von Neumann machines king at the other end. Multiprocessor
architectures and systolic arrays lie in between. Professing in von Neumann computers is
centralized (in CPU) and perfomed in serial. The power of these computers comes from
the superb accuracy and speed of their computational elements that operate based on
predefined algoriihms. On the other hand, neural processing is highly distributed
(decentralized) and is perfonned by many simple computational elements. The power of
neud computations comes from massive number of elements concurrently performing a
collective task. Moreover, neural computation is non-algorithmic and model-free.
In terms of a realistic future outlook, one should not expect that neurd networks will
replace conventional digital computers. The basic reasons are that conventional
computers are nowadays very inexpensive to make and extremely accurate and fast in
executing numerical calculations, text and data processing, computer aided design, etc.
Perhaps the most important applications of neural networks are those involving
classification, association [98] and computationally intensive yet seemingly non-
University of W~ndsor
dgorithmic problems, sometimes termed as perceptive computations [88], that are not
successhilly attacked by conventional computen.
Neural networks have already been known as good pattern classifiers 150). They have also
played an important mle in the field of document image anûlysis, especially in commercial
optical character recognition (OCR) systems [67], [76], [6]. Recently neural networks
have been successfully used in applications such as intelligent control of industrial plants
[4], self-calibration of commercial space robots [75], on-line fraud detection of credit card
operations handling 1 million transactions per month [28] and automotive control (1 11,
[70]. Prototype neural chips with temporal learning capability have been presented
recently for red-time tracking and control applications 1771, 1781. Other potentid
applications are in the areas requinng human-like inference and perception of speech and
vision. especially in real-time systems [98]. To our civilization that is already heavily
dependent on conventional digital computers. future artificial neural networks will offer a
key complementary technology rt their best.
1.2 Neural Network Implementations
A taxonomy of artificial neural network implementations is shown in Figure 1.1.
The main categories shown in this figure are descnbed below.
1.2.1 Software vs. Hardware
Neural networks have ken implemented widely on software platfonns. Software
implementations are neurosimulators run on conventional computers. The first computer
simulation of a neural network was perfonmd by Rochester in 1956 on a Hebbian
leaming network [73]. Ironically, the first estimates for the capacity of the brain and the
notion of imprecise neural computation was suggested by von Neumann [87]. Nowadays,
complicated neurosimulatoa are widely available on personal computers and
workstations. Fiexibility is a main advantage of these simulators as they can be used for
various neural network topologies and training algorithms. Parameters such as the number
University of Windsor
of Iayers, the number of neurons per layer and the type of neuron nonlinearity can be
easily changed by user and be explored based on application demand. Neurosimulators
have various user-fnendly graphical and text interfaces that give user a quick insight on
the performance of hisher simulated network mode1 or training scheme. In fact, a great
deal of the recent public interest in neural networks is indebted to advances in cornputer
technology that made it possible to simulate large networks in a flexible, interactive and
inexpensive manner.
Figure 1.1 Taxonomy of Artficial Neural Networks
On the other hand, the use of a serial cornputer for implementing a neural network seems
to be somewhat paradoxical. The nature of computation in a digital serial computer is
very different €rom that of a neural network as explained earlier in Section 1.1.1.
Although software simulators are of great significance, it is in fact a hardware
implementation based on many parallel computational elements that can tmly exploit
inherent speed and properties associated with parailel distributed pmcessing of neural
networks. Simulations of neural networks in real-world applications (as opposed to
University of Windsor
simple problems like EXOR) are computationally intensive to the degree that the
processing bottleneck on conventional cornputers limits practical explorations of large
networks. Many such applications require architectures composed of several dozens or
hundreds of neurons (summation and nonlinearities) connected via thousands of synapses
(multiplications). The training of these networks, such that they really generalize and not
just memorize the training data set, requires large data sets while training is usually a slow
and iterative process. Moreover, many of the mal-world applications are time-critical in
which the neural network has to be used 'on-line' [35]. For such applicaâions where huge
arnounts of calculations are to be perfomied in a very limited time, software simulations
are inadequate and thus hardware implementations of neural networks become inevitable.
1.2.2 Optical and Optoelectronic Implernentations
On hardware. neural networks have been implemented by a variety of optical or electronic
techniques or a combination of both. The main advantages offered by spatial optics are
high-speed processing and the possibility of fully parallei synaptic connections in three
dimensions. As we know, the tremendously complex biological synaptic connections are
made in three dimensions, while microelectronics virtually offers a 2.5 dimensional
implementation, i c . r limited 2-D interconnection possibility as well as a stack of just a
few conductive layea in third dimension. Despite the possibility of 3-D interconnections,
pure optical techniques tend to be more bulky and expensive, and less flexible than
microelectronic counterparts. Some modem optoelectronic neural networks take
advantage of a combinatian of both techniques.
An optoelectronic learning chip from Mitsubishi uses variable sensitivity photodiodes
(VSPDs) with metal-semiconductor-metal structure. The photosensitivity of this device is
a hnction of an applied bipolar bias voltage. In this way, a two-quadrant multiplier was
obtained to implement an electrically programmable synapse. A linear array of 8 LEDs
was used for input lines that was stacked on a 2-D 8 x 8 VSPD array. Based on this
architecture, an 8-8-3 multilayer perceptron with 640M updates per second was realized.
University of Windsor
The use of 2-D optical arrays for photo-activated synaptic multipliers had been studied in
the VLSI Research Gmup at Windsor in the late 1980's [9]. A three-layer perceptron
network based on combined use of LCD devices and an amorphous-silicon
photoconductor array was implemented in Xerox research center [84]. Numerous neural
network architectures based on free-space optics and optoelectronics have been reported
in [8] and [91]. Most of the optoelectronic neural network implernentations are bsed on
devices with special fabrication processing (721, [84] that makes them more expensive
compared to fewer implementations based on standard CMOS or BiCMOS technologies
IW
Integration of a photosensitive array acting as optical input nodes to a neural network
allows a fast fully-parallel application of large input vectors without pin limitations or
multiplexing delay. This approach has k e n investigated in standard CMOS by our group
[12], [42] and is explored hinher in Chapter 6 of this thesis.
Electronic neural networks can be divided into three groups: digital implementations,
analog implementations and hybrid (mixed analog-digital) implementations.
1.23 Digital Implementations
Digital implementations of neural networks can further be subdivided into two groups,
namely, neuroprocessors and dedicated chips:
Neumpmessors: Neuroprocessors (or neuro-accelerators) ore speciai purpose co-
processors for neuro-simulators. Aimed at accelerating the performance of a neum-
simulator program, a neuroprocessor generally cornes on a special board that fits in a
dot of a PC or workstation. This approach combines an accelerated speed with the
benefits of a flexible user-friendly software environment. Exarnples of commerciaily
available proâucts are, Mark II and Mark IV from TRW and Synapse I from Siemens
for VME-based workstations.
University of Windsor
Dedicated chips: Dedicated digital neural network chips exploit more parallelism and
achieve speeds that are typicaily one or two orders of magnitude higher than computer-
based neutoprocessor boards. As a rnatter of fact. a large portion of commercially
available dedicated neural network chips are realized as digital circuits. The main
reasons are high precision and flexibility combined with the availability of mature
digital design tools. A major drawback of digital implementations is the large area and
power consumption per functional unit (multiplication, addition and nonlinear
lunctions). The growing complexity practically limits the size of a hilly parailel digital
implementation. In moderate to large designs. computational units. especially synaptic
multipliers. are shared in a time-multiplexing scheme. An obvious outcome of a
multiplexed solution is a slow down in overall speed proportional to the multiplexing
factor. The reductions in area and power do not usually match the multiplexing factor
due to the overhead created by multiplexing lcontrol circuit. Therefore, the power-
delay product becornes even higher in a multiplexed scheme.
Ford Motor Company successfully developed a prototype dynmic controller based on
recurrent neural networks for on-vehicle idle speed control[70]. Training and recall in the
prototype is based on an extemal digital computer, but the recall may easily be executed
directly in the vehicle's powertrain control module (PCM). Neural network control has
also been applied to anti-lock brake systems (ABS) [l I l .
Lneuro 1.0 from Philips is a general purpose building block processor which has 16
Processing Elements ( P b ) with 16-bit resolution each [52]. On-chip weight memory is
liU3 which is enough for 512 weights of 16-bit, or 1024 weights of 8-bit resolution. The
sigmoid hinction is not impiemented on chip. The speed per chip is 1OOM connections per
second, but in each cycle only one neuron output is available.
The Connected Network of Adaptive Rocessors (CNAPS) chip from Adaptive Solutions
consists of 64 parallel Frocessor Nodes (PNs) each containing an adder. a multiplier, 32 x
&bit registers, 4KB weight memory and bus drivea [34]. Sixty four neurons and a total
of 128K 16-bit weights can be implemented on chip for a total speed of 8ûûM connections
University of Windsor
per second. Several CNAPS chips can be cascaded on a common bus controlled by a bus
arbiter chip. Howevcr, this approach is not suitable for large networks in which case PNs
would spend more time on waiting for bus availability than for computing.
Hitachi reaiized a 5-inch Wafer Scale Integration (WSI) neural network consisting of 48
chips, with 12 nine-bit neurons per chip and 64 eight-bit weights per neuron. Each of the
576 neurons has only one multiplier which performs one 8 ~ 9 b i t multiplication in about
0 . 5 ~ ~ . The whole wafer perfoms 1240M connections per second. The physicül size, iis
well as proportional power, of such implementation by itself should be an indication of
overwhelming complexity of a digital neural network realization.
Analog implementation seems to be a natural choice when neurobiology is considered as a
mode1 for artificial neural networks. A biological neural network consists of numerous
imprecise elements performing a collective task in p d l e l . Fundamental considerations
show that analog processing is more efficient (than digitai) with respect to power and chip
area when low precision is acceptable [SS]. This is the case for perceptive processing in
which the need for precise individual cells is replaced by that for collective computation in
a massively parallel architecture. Vision is an example of a domain to which anaiog
collective computation is imrnediately applicable, since the information to be processed is
inherently massively pmllel and can be acquired easily by an array of on-chip light
sensors [88], [89].
Basic operations required in a neural network can be performed by simple analog circuits
resulting in a dense asynchronous realization. The advantages of analog circuits in the
context of neural network implementation are small area, high speed and low power
consumption. Analog MOS circuits in subihreshold (weak inversion) region especially
offer ultra low power and thus a possibility for denser and larger neural network
redizations [53], [90]. The main problem of analog implementations is their inaccuracy
as anaiog components are subject to mismatch, offsets and gain errors due to fabrication
University of Windsor
process variations. Analog circuits are also more susceptible to noise, temperature effects
and power supply variations.
The effect of inaccurate analog components in a neural network chip can be compensated
to some extent during a learning process that takes into account the acnial characteristics
of hardware (e.g. in a chip-in-loop or onîhip training scheme). Nonetheless, the effects
of implementation errors and inaccuracies become more apparent at the output of larger
neural networks based on a stochastic study [68]. in iiddition, it should be noted that a
high precision is required in the training phase and it is only in the recall phase that a
lower precision can be tolerated. For example the popular back-propagation algorithm
relies on relatively high resolution implementations (at least 9-10 bits in digital domain
[93], or equivalent analog accuracy) and appears to be especially sensitive to offsets
present in an analog implementation (141.
Another problem toward a My-analog neural network implementation is that of analog
storage of synaptic weights. A possible solution is to use onthip capacitors to store the
analog quantity as a charge deposit [40]. This is a volatile storage scheme that requires
initialization and periodic refreshing through D/A converters connected to digital
mernories on or off chip. Therefore, it eventually relies on a digital storage mechanism. A
non-volatile solution, but one that usually requires a special fabrication technology, is to
store charge on floating-gate devices and use them in Programmable Read-Only Mernories
(EEPROM, EPROM, ...) in an analog manner [65], [82], [2]. Programrning and update
cycles of analog EEPROM are relatively slow and typically involve high voltage pulses.
Moreover, EEPROM suffea from aging problems: (i) it can be reprogrammed only for a
limited number of times before starting to degenente and (ii) during long term storage,
then is a srnall charge ieakage in the order of a few percents per several rnonths. Finally,
analog EEPROMs have r limited accuracy that can not be increased arbitrarily, partly
because they are based on structuns optirnized for commercial digital technologies.
Perhaps the most well-known commercial analog neural network chip is the Electrically
Trainable Analog Neural Network (ETANN) 80 170 from htei 1371. The ETANN consists
-- - -
Introduction N e d Nd- Implemntprions 1 t
University of Winhr
of 64 neurons and 10,240 synapses based on fdly parallel Gilbert multipliers. The
neurons can be time-multiplexed in order to implement two layers of size 64 x 64.
Synaptic weights are programmed and stored on analog EEPROM cells. The training of
the chip is supported by the Intel Neural Network Training System ( i N m , a board ihat
includes D/A and A/D converten and PC interfacing capability. The combined accuracy
of analog neurons and synapses is equivalent to 6-bit resolution [86]. The overall recall
speed is about 2G connections per second. Despite high-speed recall, the training process
is slow due to the long programming cycle of analog EEPROM as well as interfacing
between the PC host computer, i N ' S board and ETANN chip. Training is usually
performed in two phases. First a set of approximate weights from an off-line training is
downloaded into ETANN. In the second phase, weights are fine-tuned in a chip-in-loop
training scheme to compensate for the limited accuracy and nonidealities of analog
hardware. Due to the elaborate training process and an aging problem of EEPROMs
mentioned earlier, ETANN is not suitable for applications in which frequent
reprogramming is requi red.
Another analog product is the Artificial Neural Network ALU (ANNA) from AT&T [SI.
ANNA has 8 neurons, 40% weights and 512 multipliers organized as 8 groups of 64
synapses working fully in parallel. A group-multiplexing scheme cm be used to create
256-input neurons, one neuron at a time. Synaptic weights are stored as charge deposits
on capacitors nfreshed by on-chip DIA converters. At maximum speed, ANNA performs
LOG connections per second. The training process however should be performed off-chip.
as is the case for most of the available neural network integrated circuits.
1.25 Hybrid (Mixed-signai) Implementatioas
Both analog and digital implementations of neural networks have their own advantages
and disadvantages as explained before. For this reason, sorne researchers have been
seeking a compromise in a hybrid (mixed-signal) implementation by exploiting the merits
of both analog and digital worlds. For example, it is beneficial to use f ~ e addition of
analog currents as well as compact nonlinear amplifiers to implement neuron function in
analog domain, w hile weights are stored conveniently in digital memories with arbitrary
precision. A multiplying DIA converter, in this case, can serve as a synaptic unit.
Pulse stream encoding, first reported in the context of VLSI neural networks in 1987 [59],
is another hybrid approach that attempts to blend the advantages of analog and digital
technologies. Inspired from biological communications in nerves, the States of neurons in
this technique are represented by a sequence of pulses with variable rates. Practical
techniques include pulse width modulation (PWM), pulse code modulation (PCM) ünd
sigma-delta modulation 1561, [29], [6 11. Multiplication is performed in the analog domain
under digital control while the final output is generated as a digital pulse stream which is
more robust against noise and easier to transmit [60]. Switched capacitors and analog
systolic arrays with digital processing capabilities are among other hybrid neural network
implementations [58], 1801.
A hybrid architecture based on time-multiplexing of a synaptic unit was devrloped in the
VLSI Research Group at Windsor [63]. This architecture reduces the complexity of
synapses and interconnections from o(N') to O(N), where N is a measure of the number of
nodes (inputs and neurons) in a network. The penalty is a reduction in the speed of
network operation in each layer by the factor of multiplexing. The overall speed reduction
K factor in a K-layer network is n i , where ni is the number of neurons in Iayer i.
i = 1
Another problem in a multiplexed-synapse architecture, not often discussed in the open
literature, is an unwanted reduction in hardware redundancy and hence in the inherent
fault tolerance of neural network chip. If the multiplexed synapse fails to operate, e.g.
due to a VLSI defect or a burst of noise, al1 of the neurons in the next layer receiving an
input from that synapse will be aff'ted. In other words, a fault will be propagated to al1
nodes in the next layer via the multiplexer.
To improve the speed performance of the hybrid multiplexed neural n e ~ k , a pipelined
architecture bascd on neurons with embedded analog latches was suggested later [9q,
University of Windsor
[64]. The pipelined architecture increases speed performance by a factor of two.
However, the improvement only compensates a small fraction of speed loss originally
incurred by time multiplexing scheme. For erarnple, a multi-layer 36-20-15-10 network
implemented based on the multiplexed architecture [63] is (20 + 15 + 10) = 45 times
slower while a pipelined multiplexed architecture is still 45/2 = 22.5 times slower than
a parallel implementation. Another practical problem in implementing the pipelined
neural architecture is its dependency on 'analog' latches. Moreover, fault tolerance still
remains an issue because of the multiplexing scheme.
1.3 Objectives
The VLSI Research Group at the University of Windsor has been pursuing a focused
activity towards the design and synthesis of rnassively pxallel processors for very high
speed compuiations. Among other computational paradigms, VLSI neural networks have
k e n a distinct ciass in this line of research* On the other hand, there h a been industrial
collaborative research projects at the university demanding the design of smart
photosensors for intelligent manufacturing control and machine vision applications.
This thesis is motivated by both the academic and industrial sides. It addresses severai
basic problems in the m a of VLSI implementation of neural networks and presents proof-
of-concept designs, from architectural level to circuit implementation and expenmental
verification. for the applications of interest. While the potential benefits of a hybrid
analog-digital implementation are acknowledged. we would like to avoid the
disadvantages of a multiplexed architecture as describcd in Section 1.2.5 in an attempt to
realize a fully-parailel neural network architecture.
The objectives of this research based on the above background are:
To develop a robust, highly-modular and fully-parallei hybrid VLSI architecture for the
implementation of multilayer neural netwodc classifiers,
University of Windsor
To present novel building blocks for the proposed architecture through circuit design at
transistor-level. fabrications and experimental characterizations.
To implement in a standard VLSI technology:
(0 programmable neural network ICs for general pattern classification applications
(e.g. in numeral recognition),
(ii) a programmable neural-based smart photosensor for on-line classification of
optical input patterns in a manufacturing process control.
This thesis is organized in seven chapters. The present chapter provides an overview of
artificial neural networks and their applications. It compares the notion of collective
neural computation with that in conventional digital cornputers and emphasizes on neural
network as a complementary tool for those areas weakly handled by conventional
cornputers. Vdous neural network implcmentation techniques are investigated. including
analog, digital. hybrid and optoelectronic methods. and several research / commercial
products are surveyed.
A modular hybrid VLSI architecture for the implementation of neural networks is
presented in Chapter 2. It is the tint 'hybrid' architecture that implements multilayer
neural networks with a single universal block based on a distributed neuron and a digital
programmable synapse. Sevenl properties of this architecture are highlighted and the
main problems tackled are descnbed. Most of the properties briefed in Chapter 2 will be
discussed in details through analyses, simulations and experimental work in Chapten 3
to 6.
In Chapter 3, a compact yet robust neuron circuit is presented that combines nonlinear 1-V
characteristics of NMOS and PMOS transistors to synthesize a sigrnoid-like saturating
function. The saturating function is studied analytically and a sensitivity analysis is
followed. Circuit analysis and simulation both indicate an interestingly low characteris~ic
Uniwnity of Windsor
variation despite considerable process parameter variations. Chip fabrications and
measunments, both onthip and chip-tothip, verify the robustness of the circuit
characteristics. Following the study of a lumped implementation, a distributed
implementation of the neuron circuit is presented. It is shown that an averaging property
in a distributed implementation further reduces the variations within one chip and creates
uniform neurons ideal for a large neural network implementation. Moreover, a self-
scaling property of the distributed neuron is demonstrated and verified experimentally.
Chapter 4 details the design and characterization of a hybrid analog-digital circuit
presented as a universal building block for the implementation of multilayer feedforward
neural networks. The universal building block is based on a programmable unified
synapse-neuron. It consists of a multiplying D/A converter (MDAC), an ûnalog sub-
neuron and r compact digital weight register. Design improvements are especially
presented for MDAC circuit dong with simulations and fabrication test results. The
application of the proposed building block to the implementation of two neural network
ICs is described in this chapter.
A stochastic rnodel for an Adaline with analog distributed neuron and digital synaptic
weights is presented in Chapter 5. In this case, it is shown that a useful self-scaling
property automaticdly adjusts the dynamic range of sigrnoidal function and hence
controls a stochastic gain by which quantization noise is amplified. In a conventional
lumped-neuron Madaline network, when the number of neurons per layer or the number of
neuron inputs increases (i.e. the network becomes larger), the effect of weight quantization
becomes more noticeable at the output. However, modeling and simulation in this chapter
indicate a considerable improvement in the ratio of desired signal to quantization noise.
when a programmable neural network hardware is based on a distributed- rather than a
lumped-neuron.
In Chapter 6 the design of a CMOS-compatible neural-based smart photosensor for fixai-
plane pattern classification is described. The design is based on the neural network
architecture and building blocks presented earlier in this thesis. Several conventional
University of Windsor
neural network implementations are first examined for this pmblem. The area and
complexity of interconnections are found to be the major Iimiting factors on the size of
neural classifier and integrated photosensing amy. A neural-based smart pixel is
presented as a building block for the sensor design. Photosensitive elements are based on
Field-Effect-Modified vertical photoBJT created in a standard CMOS technology.
A programmable smart photosensor comprised of a 2-D array of 64 photosensoa and a
fullytonnected multilayer neural classifier is implemented. Compared to a conventional
implementation the proposed design has greatly reduced the intcrconnection areas and
hence increased the synaptic density as well as the dimensions of the integrated
photosensor array.
Finally, Chapter 7 presents the conclusions of this thesis. It contains a summary of the
contributions discussed in previous chapters, ranging from architectural level to novel
circuit designs to the properties shown through theoretical analyses and experimental
implementations. Suggestions are also made for future research work.
Chapter 2 A New H y b d
VLSI Architecture
2.1 Introduction
Tremendous advances in microelectronics, have made VLSI neural
networks popular within the piut ten years, as neurai networks can
provide red-time solutions to some of the real-world problems
traditionally difficult to handle by conventional digital computers
(cf: Chapter 1, Section 1.1). However, neural network VLSI
designers face many challenges. for instance, in implementing
massively interconnected networks, producing fully parallel input/
outputs and developing modular and scalable architectures
adoptable for different applications. Area and power efticiency,
speed, storage and accuracy are some other main issues.
Analog implementations of neural networks are compact, low-
power and high speed. Analog implementations however are
inaccurate and susceptible to various process variations. Analog
storage also remains a problem. On the other hand, digital
techniques offer higher precision and a multitude of flexible
architectures and design tools. However, a fully-parallel digital
neural networks implementation cornes at the cost of a complex
design with a large silicon m a and power consumption. An
attempt to multiplex computational cesources on chip reduces the
--
A New Hybrid VLS t Archit~cturr intmâuction 18
Uniwnity of Windsar
complexity, but significantiy downgrades the speed performance and fault tolerance of
neural chip (see Chapter 1, Section 1.2.3 and Section 1.2.5 for more details).
Compromise solutions, such as the one presented in this thesis, Iead to hybrid
implementations: mixed analog-digital circuits, and possibly optoelectronics. A robust
hybrid architecture should be able to address simultaneously as many implementation
problems as possible.
2.2 Some Implementation Problems
Some of the main issues in VLSI implementation of a neural network K, as far as this
thesis tries to address, are as follows:
AppUcation dependency: The topology and size of a neural network is highly
application dependent. The number of inputs. loyers. and neurons per layer can vary
based on input/output requirements and even the nature of leming patterns in each
application. Therefore, a highly modular architecture is required to facilitate and speed
up the process of designing a custom VLSI neural network chip.
Interconnection pmblems: In a rnultilayer neural network as shown in Figure 2.1 (a),
a fully-connected implementation between rn input nodes and n nodes (neumns) in the
next layer requires ni x n synapses and 2 x m x n physical interconnections (one input
and one output interconnect per synapse). Therefore, with similar situations in other
layers, the number of synaptic interconnections grows with o(N~), where N is a
measure of the number of nodes (inputs, and hidden and output neurons) in the neural
network. With a limited number of metal layes (e.g. 2 to 4) available for
interconnections in each technology and a minimum metal pitch for each layer
according to particular design niles (see Figure 2.l(b)), one can notice that routing
compkxity and interconnection areas roughly gnmr with quadratic order of the number
of nodes in a neural network chip. Given a limited die area. this situation eventually
creates a bottieneck and puts a restriction on the size of a neural network
implementaiion in terms of the number of nodes and layers.
A N m Hybrid VLSI khiiccnirc S o m ünplcnrncorion Robkms 19
Uniwnity of Windsor
Figure 2.1 a) Interconnections in a lullylconnected multllayer neural network, b) limitations on VLSI intercomections
Metal 1 Pitch
Various implementation ermrs: Inevitable errors are introduced when an ideal
function or quantity is implernented by physical (nonideal) hardware. In analog
circuits. inherent inaccuracies and characteristic variations create discrepancies from
ideal values. In digital implementations, quantization effects are the main source of
enor where an ideal value is realized by a finite word length or limited precision. The
effect of implementation erron becomes more important in Iarger neural networks.
Properly-scaled sigutoidal function: Multilayer feedfoward neural networks, or
Madalines, rely on neurons with a saturating hinction. the most popular of which is a
sigrnoidal' function. This function should be properly scaled over its input dynamic
range, or eise it effectively becomes either a hard-limiting or a low-gain linear function.
A hard-limiting neuron (threshold Adaline), for instance, is more sensitive to the effects
of weight quantization [68]. Improper scaling may often occur when a neuron
realization receives different numbers of inputs in different applications. Redesigning
the neuron is a possible but obviously inconvenient solution (cf Chapter 5 for details).
1. The ienns 'sigrnoidal' and 'sigrnoid-like' are used hereinafter in the sense of any rnonotonic "Sn shaped curve and do not necessarily rcfer to a specific mathematical function.
University of Windsor
Pin limitations: Real-world applications require sizable neural networks that among
other things require a large number of input/output (VO) pins. This situation creates an
'V0 bounded' layout meaning that the required silicon die becomes much larger than
actual core (neural network) area. In addition, an excessive number of VO may exceed
the number of package pins, or at least increases interface complexity and cost. Often
inputs outnumber the outputs due to a funnel-shaped topology. In such cases, opto-
electronic input coupling offers an attractive solution to remove pin limitation problem.
Pmgmrnmability: Digital programmability of synaptic weights that define the
mapping function of a neural network is an asset that mdces an implementation more
flexible and genenl purpose.
2.3 A New Hybrid Distributed-Neuron Architecture
2.3.1 Creating a UMed Synapse-Neuron (USN)
A conventional neural network implementation consists of two types of building block:
synapse and neuron. Figure 2.2(a) shows a conventional structure of a lumped neuron
with N input synapses. A linear' synapse multiplies its input xi by a synaptic weight wi .
A neuron performs a summation (Z) on synaptic inputs, a bias or threshold adjustment
(denoted as -0 in Figure 2.2(a)), and a sigrnoidal (S-shaped) saturating functionfl.):
For better modularity, the threshold hinction cm be implemented as an additional synapse
as shown in Figure 2.2(b) with a constant input (e.g. xo = 1 ), while the threshold value
is programmed in the synaptic weight register, i.e. wo = -0. Other synaptic units in
Figure 2.2(b) are similarly considered to be digitally programmable.
-- -- -
1, Nonlinear types of synapses, e.g. quadratic and Gaussirui, have also been reported in literature [30].
A N m Hybrid VLSI Aichiîccîurc A Ncw Hybrid DUtriktrcd-Ncwon Ardiiîecturc 21
University of Windsor
Figure 2.2 Evolutionary steps to create a unifid synapse-neuron in a hybrid architecture
Vout Vout
A New Hykid VLSf Architccnuc A New Hybrid Dis\rikiicd-Nemn Archikcturt 22
University of Windsor
The remaining parts of Figure 2.2 show the evolutionary steps to create a new hybrid
building block for the architecture presented in this thesis, as explained next.
If the output quantity of a synapse is a cumnt. summation cm be simply implemented in
the analog domain by connecting the outputs of synapses together (Le. KCL) as shown in
Figure 2.2(c). The neuron's output and hence the synapses' input quantities are chosen to
be voltages. Therefore, in this hybrid analog-digital architecture synapses are digitally-
programmable V-to-I muhipliers. A common choice for neuron could be a transresistance
(1-to-V) amplifier as shown in Fi y r e 2.2(c).
For the reasons briefly mentioned next and clarified later, especially in Chapter 3, the
neuron function in Our architecture is realized by a nonlinear load that receives the
summation of synaptic currents and delivers a voltage on the same summation node as
shown in Figure 2.2(d). The current to voltage transfer function has a sigmoid-like shape.
The configuration in Figure 2.2(d) shows a lumped nsistive-type (1-to-V) neuron that in
practice is implemented with active MOS transistors.
A resistive-type neuron implernentation can be distributed into parallel elements, known
in this thesis as sub-neurons, such that their overall nonlinear characteristic remains the
sarne. This temporary step is depicted in Figure 2.2(e). Compared to the original lumped
neuron, each sub-neuron has a larger equivalent nonlinear resistance and a proponionally
smaller m a as it is implemented with active devices. The neural output voltage is built on
a supernode denoted as V,, in Figure 2.2(e). Although the synaptic cumnts (li 's) can be
al1 different, the currents going through sub-neumns are the sarne since they are al1 created
by the sarne voltage V,, applied to a similar nonlinear irnpedance.l The output of the
distributed neuron in this case is,
1. We negiect characteristic variations between sub-ncurons for now.
A New Hybnd MSI Archiucturc A Ncw Hybnd Disaibuitd-Nemn Arcbiteam 23
University of Windsor
wherejJ.) is the nonlinear resistance function of a sub-neuron, as opposed (OR.) which is
the collective neural function. A more detailed analysis of a distributed neuron mode1 is
presented in Chapter 5.
Each anaiog sub-neuron in a distributed impiemenration cm be integrated with a digitally
programmable synapse as shown in Figure 2.2(0 to create a hybrid block known in this
thesis as a unifed synapse-neuron (USN). Note that for better rnodularity, the threshold
block (Vo, 1,) also includes a sub-muron which is nonetheless deactivated. A fully-
analog distributed neuron-synapse was presented in [8 11 for the purpose of implementing
a reconfigurable neural network. This thesis presents a new alternative hybrid architecture
with severd distinct featuns and contributions. Mixed-signal circuit blocks are presented
and charactenzed for this hybnd architecture with innovative designs in both synûptic and
distributed neuron realizations. New properties of a distnbuted neuron implementation
are explored in the next chapters thût were not addressed in [8 11. Moreover, an emerging
property unique to a 'hybrid' distributed implementation is found that impmves the effect
of weight quantization. This property could not be brought up in a full-analog domain. In
addition, in the context of a neural-based smart photosensor realization, the proposed
architecture provided the foundations of a robust programmable sensot design, and in
specific, reduced the synaptic interconnection areas and routing problems significmtly.
The properties of the proposed architecture are highlighted in Section 2.3.3 and will be
studied through analyses, simulations and experimentd implementations presented in the
forthcornhg chapters.
23.2 A Modular Neural Network Irnplementation
A unified synapse-neuron block pnsents a highly modular approach to a hybrid
irnplementation of multilayer neural networks. We assume a digital weight register is built
A New Hybrid VLSI Arc hirecnin A New Hybrid Disaibutcd-Ncumn Atchiltdure 24
University of Windsor
into the synaptic multiplier sub-block. In this way, the USN is upgraded to a universal
programmable building block. The universal block, a circuit realization of which is
presented in Chapter 4. is the only block required CO implement a complete multilayer
VLSI neural network. The parallel output connection of N such building blocks, for
instance, makes a neuron with N digitally-programmable input synapses, Le. an N-input
Adaline.
Figure 2.3(a) shows three such Addines (in three columns) with four common inputs that
altogether form a single-layer 4-3 neural network built with 12 building blocks. A
muitilayer fully-connected m-n-p feedforward network' (where m is the nurnber of input
nodes, n is the number of neurons in hidden layer and p is the number of neurons in output
layer) can be implemented by interconnecting regular (m x n) and (n x p) anays of the
universal hybrid building block. Figure 2.3(b) shows a 4-34 neural network built with 18
blocks. Additional blocks of a very similar nature can be used when neuron threshold
adjustment is required. nireshold blocks receive a constant non-zero input voltage, their
threshold value is stored in the built-in weight register, and have their nonlinear sub-
neuron disactivated. Details of a circuit implementation of this network can be found in
Chapter 4, Section 4.3.1.
23.3 Properties of the Architecture
The salient features of the proposed hybrid distributed-neuron architecture are as follows:
Modularity: From the previous descriptions and examples shown in Figure 2.3, it can
be perceived that the proposed architecture is highly modular. Modularity simplifies
and speeds up the process of designing a custom integrated circuit. Multilayer neurai
networks can be conveniently built using regular arrays of a universal building block.
Examples o f neural network chips built based on this a m y architecture can be found in
Chapter 4 and Chapter 6.
1. Sometimes denoted as a m x n x p network,
A New Hybrid VLSl Archilecture A New Hybrid Dutribumi-Neumn Architecture 25
University of Windsor
Fipre 2.3 Modular neural nehvork impkmentations: a) a 4-3 neural netwotk, b) a multilayer 4-3-2 neural network
A New HyMd VLSI Airhiî~cnur: A New Hybrid Disuibtcaî-Ntumn Archikaun 26
University of Windsor
Silicon area efficiency: An anaefficient silicon realization is achieved for two
reasons:
(i) Reàuced interconnection airas and pmblems: Instead of global interconnections
from synapses to neurons, there is a local connection inside each USN block where
a sub-neuron is densely packed with a synapse. The remaining global connections
are made on regular buses laid on vertical or horizontal channels between the
blocks. VLSI routing problems are also reduced for the same reason.
(ii) Inter-block area saving: In any VLSI implementation where we deal with various
types of custom blocks with different ce11 sites, there are natunlly some unused
areas confined among unfit adjacent blocks. On the other hand, in an architecture
that deals with just one type of building block, there is little potential for creating
unused inter-block areas.
A quantitative case study on interconnection area reduction is given in Chapter 6.
Self-scaling sigmoidal eharacteristics: A distributed-neuron architecture exhibits an
intcresting self-scaling property. As the number of inputs to a neuron increases, e.g. in
a more demanding application, each new input synapse brings a corresponding
nonlinear sub-neuron and incrementally adjusts the overall sigmoidal neuron
characteristic. This anolog phenornenon prevents an improper neuron scaling (e.g. a
hard-limiting or a low-gain linear characteristic) that could disturb effective
functionality. It thus circumvents the need for a redesign. Circuit simulation and
experimental verification of this property is given in Chapter 3 and a theontical
modeling cm be found in Chapter 5.
Quantization noise improvement: Quantization noise improvement is a direct
consequence of self-scaling property; however, it only emerges in the context of a
hybrid architecture that relies on quantized weight values. Analyses and simulations
supponing this theory on the basis of a new stochastic modei for distributed-neuron
Adaline are presented in Chapter 5.
University of Wtndsor
Averaging property in neurons: Sub-neuron elements on a physically distributed
array take an average of various pmcess parameters, e.g. threshold voltage gradient
over a silicon die. This property significantly reduces analog process variations among
sigmoidai neurons and makes their characteristics virtually unifom. A discussion of
the subject dong with fabrication measurement results is presented in Chapter 3.
r I n c d fault tolerance: Neurai networks posses an inherent fault tolerance mainly
due to the redundancy in their synaptic connections. In the proposed architecture, there
is an additional element of redundancy in 'neurons'. As each neuron implementation is
distributed among N sub-blocks a fault. e.g. a VLSI defect, would affect only 1 /N th of
a neuron while the other parts remain intact. A discussion is provided in Chapter 3.
a Automatic fan-out Lncrease: A neuron circuit has to be able to drive ail of its outgoing
synapses. Total load may include a considerable number of interconnection Iines and
input impedances in the next layer. High-drive buffers have been presented for large
networks [97]. In the proposed architecture. a USN is a "more-fan-in more-fan-out"
entity: a configuration with a higher number of inputs results in a higher number of
output transistors in parallei, that immediately translates into a lower output impedance
and a higher drive. Therefore, in a sizable network a large neuron supernode is
potentially capable of driving rnany blocks in the next layer. if required. Note that this
property is related to output impedance and driving capability. and should not be
mistaken with self-scaling property of neuron's saturating function.
hgrammability: Digital prograrnmability of the universal hybrid block in a Read
Write weight ngister allows the mapping function of the neural network to be redefined
for different applications. This Rexibility is used in the design of a general-purpose
neural network classifier presented in Chapter 4 and a programmable smart photosensor
explained in Chapter 6. A traininglrecall simulator is especidly developed for USN
architecture. Following an off-line training session and a simulated recall, digital
weights are progmmed in a chip using a host cornputer or a software-controlled
digital tester (cf: Chapter 6).
University of Windsor
Optoelectmnic integration: Although the architecture described in this chapter does
not include an optoelectronic component. the regular nature of its silicon
implementation lends itself to a possible photosensor array integration. Such sensor
fusion technique on input nodes, removes pin limitation problems and allows r fully
parallel input operation. Details of an optoclectronic architecture especially developed
for a neural-based smart photosensor is left for discussion in Chapter 6.
2.4 Conclusion
In this chapter, a new hybrid distributed-neuron architecture was presented for parallel
fully-connected implementation of multilayer neural networks. Salient features of this
architecture are as follows:
Modularity based on a universal building block
Fully-parallel singleîhip implementation
Silicon area efficiency and reduced interconnection problems
Self-scaling sigmoidal characteristics of neurons
Weight quantization noise improvement
Averaging property resulting in nearly-unifom neuron functions
Increased fault tolermce
Automatic fan-out increase for USN blocks
Digital programmability
A brief intuitive discussion of each property was included in this chapter while references
were made to the forthcoming chapters for detaiailed analyses. simulations or experimental
results. A special optoelectronic neural neiwork architecture based on the proposed USN
approach and the notion of a neural-based sniart pixel will be presented in Chapter 6 dong
with the required background*
A New Hybrid MSl Architecture Conclusion 29
Chapter 3 Distributed Neuron and its Properîies
3.1 Introduction
Analog VLSI circuits provide compact. high-speed and power-
efficient nalizations of artificial neural networks. Analog
implementations are, however. inaccurate and prone to process
variations and mismatch. CMOS circuits, for instance, are
especially sensitive to variations in threshold voltage (VJ of
transistors. Therefore. considering the complexity of real world
neural networks, an analog circuit which is both simple and
accurate would be an attractive choice for VLSI neural networks.
In this chapter, a nonlinear nsistive-type neuron is presented that
implements a saturating function by combining nonlinear
characteristics of MOS transistors. Characteristic variations are
found to be inherently small in analysis, simulations and
measurements. Variations are measured both across a chip and
among several fabricated chips. Onchip variations are reduced
hirther by using a distributed-neuron implementation which utilizes
a parallel configuration of identical compact sub-neurons. Another
property narnely the self-scaling of disîributed-neuron circuit is
demonstrated through simulations and experimental measurements.
University of Windsor
3.2 Nonlhear Resistive-type Neuron
Amplifier-type neuron circuits use MOS transistors in exponential (sub-threshold) or
square-law (above-threshold saturation) region of operation. In simple non-feedback
configurations, these two-port circuits generally mngnify the effect of device parameter
variations. e.g. V, , dong with the desired (neural) signal. Generaily, the idea behind an
amplifier is to 'linearize' the characteristics of the active device(s) using various circuit
techniques. In an amplifier-type neuron, one makes a Further attempt to make the
linearized characteristics 'nonlinear' again!
On the other hand, a nonlinear resistive-type neuron such as the one presented in this
chapter, relies on basic nonlinearity in V-1 characteristics of MOS transistors to
approximate a sigrnoid-like saturating function. it thus avoids unwanted enhancement of
parameter variations through exponential or square law. Moreover, the resulting circuit
has a lower complexity compared to an amplifier counterpart. Finally, a resistive-type
(1-to-V) neuron receives the total sum of synaptic currents and converts it to a voltage on
the same node. It thus provides a one-port implementation which is compact by nature.
3.2.1 Circuit Description
A resistive-type neuron based on a nonlinear load is reported in [8 11 and shown in Figure
3.1 (a). The saturating 1-to-V function of this neuron circuit relies on the characteristics of
two transistors MbM2 and a linear transition region corresponding to û resistor R.
The resistor may be implemented by additional MOS transistor(s) or resistive layers, or
may n l y on parasiticlleakage impedances in the system [48].
A modified circuit is presented here based on four transistors (and no resistor) that
approximates a S-shaped neural function by combining quadratic characteristics of the
MOS transistors. A circuit diagrarn is shown in Figure 3.1 (b). In fact, the two addi tional
devices M3-M4 are replacing R with a lightly S-shaped characteristic in the region where
Ml and M2 are both OFF. However, M3 and M4 do not implement a simple resistor as
University o f Windsor
they are not operating in their triode region. Note that a real sigmoid function does not
constitute a region with constant derivative (slope) even though it might be approximated
that way. Therefore, the presented circuit provides a more realistic approximation to a
sigrnoidal function. one which is based on four nonlinear hinction segments, rather than
two nonlinear and one linear segment.
Figure 3.2 shows the simulated characteristics of the two circuits in Figure 3.l(a) and (b)
based on the device sizes tabulated in Table 3.1. Both circuits are designed to reach O or
5V at extreme synaptic currents of 11OûpA. With a large value chosen for R. the original
circuit of Figure 3.l(a) shows a stepwise transition, and in any case it has a constant-
derivative (linear) region ended by abrupt changes in the derivative at the two points where
the line intersects the nonlinear regions. On the other hand, the modified circuit of Figure
3.1 (b) creates a smoother transition in the function and its derivative, a behavior closer to
that of a sigmoid function. A differentiable sigmoid-like neuron function is especially
desirable for in-loop training using popular gradient-based algorithms.
The modi fied neuron circuit of Figure 3.1 (b) has a compact cell Iayout (36.4pmx 19.4pm)
in a 1 . 2 ~ m CMOS process (cf: Figure A. 1). There is no noticeable layout overhead for this
design cornparrd to the original circuit because, ( i ) each transistor now has a smaller
widih (see Table 3.1) so that the combination of two NMOS (or PMOS) transistors
handles the same amount of current; (ii) resistor R has been nmoved. The shapes of the
nonlinear function segments cm be varied by adjusting the aspect ratio (WL) of the
transistors and the bias voltages VBl and VB2. An analytic study is presented next.
31.2 Analysis of 1-V Characteristics
The saturating function of the neuron circuit in Figure 3.1 (b) is an outcome of nonlinear
1-V characteristics of four MOS transistors in their saturation region. In general, four dc
voltages could be used to bias the gates of the four MOS transistors, resulting in five
ngions of operation between O and Vm. With a carefbl design however we are able to
simpliv the biasing scheme to use only two common bias voltages, narnelp VBI and Vm.
University of Windsor
Figure 3.1 Circuit d i a p of nonlinear 1-to-V murons: a) original circuit [BI], b) the modüied circuit
j Gnd
Table 3.1. Device sizes for the two neuron circuits shown in Figure 3.1 and simuiated in Figure 3.2
Ml: 10.01 1.6 Ml: 4.W 1.6
M3: 2.0/ 2.0
Figure 3 3 1-V characteristics of the circuits in Figure 3.1(a) and (b)
Vout
University of Windsor
Moreover, we c m eliminate an intermediate region in which al1 transistors would be off.
The regions of operation and their boundaries are marked on a simulated 1-V curve in
Figure 3.3(a). The disîribution of input current among the four transistors is shown in
Figure 3.3(b). Simulations are performed using level 3 Hspice models for the target
1 . 2 ~ CMOS process.
Figure 3.3 a) Regions of operation on S-shaped V-1 characteristic, b) current distribution in four MOS transistors
University of Windsor
To choose proper bias voltages, the following design criteria is applied in order to merge
two of the intemediate boundaries in Figure 3.3(a) elirninating a region in which ail four
transistors would be OFF:
On the other hand, to have a center line around V D D / 2 , we choose:
Table 3.2 specifies the four regions of operation for the neuron circuit. For positive input
currents, both NMOS transistors are OFF and one or both of the PMOS transistors
conduct. The amount of voltage built up on V,,, node determines if one or both of the
PMOS transistors should conduct. On the other hand. for negative input currents both
PMOS transistors are OFF and one or both of the NMOS transistors conduct.
The slope of 1-V curve at lin = O would be limited in practice by the equivafent output
impedance of synaptic current sources connected in paralle1 to the neuron input. The
dope of the curve at the two saturation end does not reach zero; an important
consideration for many learning aigorithms.
Table 3.2. Regions of opedon for the neuron circuit shown in Figure 3.l(b)
. -- - - - -
W b u t c d Neumn and iu Pmp#ticr Noaiim~r RcsirtM-iyp Neuroa 35
University of Windsor
It can be further shown for this circuit that when a transistor conducts, it operates in the
saturation (non-triode) region. The saturation condition for an NMOS is defined as:
For instance, in Region 1 (O < V,,, < VBl - V,,). for the conducting transistor M4 we have
( VDS)* = VDD - Vau, and ( = VBZ - Vau,. Therefore, the saturation condition
defined in Eqn. (3.4) always holds:
Assuming an ideal square law in sahwation region, the output voltage can be obtained as a
function of input current in each region by writing KCL relation on output node and
solving the corresponding quadratic equation. For example, in Region 1:
where p, is electron rnobility and Co, is MOS oxide capacitance per unit area (~/d).
Eqn. (3.6), after some manipulations, results in the following quadratic equation:
in which Km= p0COX. One of the two mots of the above quadratic equation is the
physical solution for V,,, expressed as a hinction of Iin and circuit parameters:
Distnbutcd Ncumn Piad its Ropciritz Nonlincar Raijtivc-type Ncuroa 36
University of Windsor
- Region 1: Vouf - '81 + ' ~ 2 2
(3.8) - V r n - K ; ~ * w ;
w2 w4 in which we have assumed - = -, L2 4
Similarly, the results for the other three regions are found and summarized below:
Region II:
Region III:
Wl w3 assuming - = -. LI =3
The assumptions made for regions 1 and IV only lead to simpler formulations for
V,,, = f (1,) and are not circuit design criteria. In sumrnary, the neural saturating
function can be expressed as a four-piece nonlinear function exhibiting in al1 regions an
inverse quadratic relation that can be defined in a general fom as follows:
where Vos, C and los are positive coefficients that are different for different regions.
Vos and los are offset voltage and offset current coefficients respectively, while C is a
coefficient in ohms (Q) that determines the shape of each nonlinear function segment.
Vos, in particular, is a function of bias and thnshold voltages in al1 regions.
University of Windsor
3.2.3 A Sensitivity Study
Parameters of interest in a sensitivity study of MOS transistors are threshold voltages
( V, , V,, ) and mobility factors ( p, . p, ; or equivalently gain factors K, , Kp ). The
presented circuit is less sensitive to mobility or gain factors than to threshold voltages, as
in al1 four regions K, or K p appear under root sign. Assurning IV,,I = V t p = V, and
neglecting body effect, we calculate the sensitivity of the output voltage with respect to
IAVtI threshold voltage. With a relative change - in threshold voltage due to process v,
variations, the relative change in output voltage wxt. Ml-scaie output (V& is:
av*,t From Eqn. (3.8) and Eqn. (3.9) we have - = -1. Also from Eqn. (3.10) and Eqn. a vm
avou t (3.1 1) we have - = - 1 (note: V,, c O ). Therefore, for al1 regions - = I . a v,
Assuming V, = 0.75 V, a full-scale range of (VOUt)J = 5 V and a worsttase process
variation of 2 x 100= 20 96, from Eqn. (3.1 3) we can find, KI
Despite the simplicity of the circuit and a considerable variation in circuit parameter V' the output voltage variation is quite low based on the above analysis. Circuit simulations,
with a threshold voltage variation applied through Hspice mode1 parameter DELVTO [ S I , confirmed the above analysis. A typical simulation plot is shown in Figure 3.4 from
which a low variation on the output characteristics can be observed.
Diiraiùutcd Newon Md its Ropiics Nonüneor Rsi s t iv t - typ Ncuron 38
For f 10% variations applied to the threshold voltage parameter, maximum variations
observed in the output voltage w.r.t full scale were about f 1.55%. Maximum variations
occumd at V,,,= VDD and V,,,= O .
Figure 3.4
5.0
4. O
3.0
2. O
1 .O
13 .O
Simulations with 110% variations on threshold voltage
3.2.4 Fabrications and Measurements
The neuron circuit of Figure 3.l(b) was implemented in a standard 1.2pm CMOS
technology. The ce11 had a compact layout of 36.4pm x 19.4pm. Bias voltages VBl and
Vm were generated on-chip using an NMOS voltage divider. Ten chips were fabricated
and tested.' Figure 3.5 shows a microphotograph of the fabricated test chip that includes
lumped neuron circuits, unified synapse-neurons and several other neural network test
structures. Transfer characteristics were measured using a Mixed-signal Test Head
(TH-1000 fmm C M 0 controlled by a test program developed in HP VEE' and run on a
HP 700i series workstation.
1. Fabrication has k e n done through the Canadian Microelcctronics Corpontion (CMC) under the design narne MHNT,
2. HP V I E Visual Engineering Environment fiom HewIett-Packasci.
Disaibutcd Ncuron ond its Roprtics Non i i n a ~ Ilcristive-typ Ncumir 39
University of Windsor
Flpre 3.5 Microphotograph of a fabricated CM06 chip that includes lumpcd neumns, uniaed synapse-neurons and other test circuits
Figure 3.6 Measured neuron characteristics from 10 fabricated chips: a) ovedaid nsults, b) a close-up view of maximum cbip-to-chip variations
University of Windsor
The overlaid results of measurements from 10 chips are shown in Figure 3.6(a).
Experimental characteristics were in close agreement with the simulations shown earlier
(e.g. in Figure 3.3, or Figure 3.4). Moreover, the maximum measured chip-to-chip
variation was 1 IOmV, or 2.2% in 5-V range as shown in a close-up view in Figure 3.6(b).
The measured value translates back to a maximum threshold voltage variation of about
15% based on Eqn. (3.13), which is smaller than woat case assumed earlier in the
simulations. The accuracy of the measurements was ilOmV.
The dispersion of the curves in Figure 3.10(a) mon resembles an offset rather than a gain
error. This reflects the dominance of threshold-type variations as described by Vos in Eqn.
(3.12) compared to other parameter variations. Maximum chip-to-chip variation of 2.2%
occurred around V,,, = VDo. Le. in region IV. On the 0 t h hand. maximum measured
variations around V,,, = O (in ngion I) was only 60mV, or 1.3% in 5-V range. The
explanation hen is that in region IV where the highest variation was observed, PMOS
transistors Ml-M3 conduct while in region 1 two NMOS transistors M2-M4 conduct.
Process variations on PMOS transistors are seen to be larger. The reason is that PMOS
transistors are cnated in an N-well that involves extra msks and processing steps
cornpared to NMOS transistors built directly on the P-substrate.
Besides the experimental study of chip-to-chip variations, characteristics were measured
'within' each fabricated chip to determine the amount of 'onship' variations. This study
has a greater importance as it indicates the arnount of discrepancy among neurons
operating in one network. Neuron cells w e n laid out at various locations, including
corner positions, on test chips.' Measurements were performed on different cells within a
chip and repeated for different fabricated chips. The worst-case characteristic variations
within one chip was 67mV in 5-V range, or 1.3%, measured between two corner cells.
This number is smaller than maximum 'chip-to-chip' variation, as reasonably expected.
Maximum onchip variations of 1.3% occumd around V,,, = VDD where PMOS
1. Some cells were located on the corners of a second generation fabricated chip, WRNBS. The core design in WRNE3S is an optical neural network describcd in Chapter 4.
University of Windsor
transistors conduct. On the other hand, around V,,, = O where NMOS transistors
conduct, the maximum measured variation was only 35mV, or 0.7% in 5-V range. Table
3.3 summarizes the experimental results presented in this section. The results, in generai,
suggest a low variation for the presented neuron circuit.
Table 3.3. A sumrnary of experimentsl results on (lumpai) neumn circuit
1 (PMOS region) ]
1 (NMOS region) 1 Maximum I Uwiniions I
In summary, circuit analysis, simulations and fabrication measurements presented in this
section al1 suggested low characteristic variations for the proposed neuron circuit despite
its simple topology and compact layout.
A majority of neumn implementations presented in the literature do not report on
measured characteristic variations and when they do, often create concem about the
I accuracy of some analog implementations, especially amplifier-type
operational transconductance amplifier (OTA) is presented in [7 11
sigrnoid-like neurons. The output quantity (cumnt) is proportional
neurons [66]. An
for implementing
to ~f and K (or
equivalently p ), which is a typicd of a MOS amplifier. Therefore, due to larger exponents
the sensitivity to both major mismatch modeling factors (V, and K) is theoretically larger
than that o f the resistive-type neuron presented in this chapter. Measurrd variations were
not reported in [71].
University of Wtndsor
3.3 Implementation and Properties of a Distributed Neuron
A useful property of a resistive-type neuron is that it cm be implemented as paralle1
combination of similar elements known in this work as sub-neumns. Each sub-neuron has
a larger nonlinear resistance such that the parallel combination of N sub-neurons creates
the characteristics required for an N-input neuron. When implemented with active
devices, each sub-neuron has a smaller transistor width to implement a larger resistance.
In general, a MOS transistor with a width of N. W in a lumped neuron implernentation is
replaced by transistors of width Win each of N constituting sub-neurons.
Figure 3.7 shows the transistor-level diagram of a distributed neuron implemented based
on the circuit presented in Section 3.2. In this diagram I l , f2, ..., IN are anaiog currents
received from input synapses. An analog summation of synaptic cumnts is autornatically
performed on a supernode. Total synaptic current (IJUm= Il + I2 + ... + IN) then divides
equally among sub-neurons as they are similar resistive blocks connected to the same
voltage.' As a result, each sub-neuron nceives an average of input synaptic currents la,
and performs a saturating 1-to-V function &(.) by combining nonlinear characteristics of
four MOS transistors as described in Section 3.2.
In practice, each sub-neuron is densely integrated with a comsponding synapse creating a
unified synapse-neuron. However, sub-neurons can be physically far apart on silicon die.
They are only connected via an analog bus on which cumnt summation is perfonned and
V,, is created.
1, Hert, we neglect neuron threshold (bias) cumnt, i.e. assume Io = O. We also assume in this circuit derivation that sub-neurons have similar characteristics, i.e. neglect characteristic variations.
University of Windsor
Figure 3.7 Implemenîation of a distributed neuron
33.1 An Averaging Effect
Analog circuits, for instance lumped sigrnoidai neurons, implemented at different
locations across a sizable chip are subject to noticeable variations in their expected
characteristics. This is due to process variations, especially the gradient of doping on die
surface that respectively causes a gradient on MOS transistor parameters such as threshold
voltage. In Section 3.2 it has k e n shown that the presented resistive-type neuron circuit
inherently has a low sensitivity to process-dependent variations. In this section it is shown
that a distributed neuron implementation further reduces the existing variations on a chip,
thus cnating very similar neumn characteristics.
krsurning a two-dimensional gradient, the difference between the threshold voltages of
two transistors located at relative distance (hx, Ay ) on a die is:
University of Windsor
In Figure 3.8(a) lumped neurons are located at maximum horizontal distance of D on a
die subject to V, gradient. In mosi practical cases we can assume a constant gradient in
avt av, each direction. i.e. - = k, and - = ky. Hence, the worst-case threshold ax ay
discrepancy among the lumped cells in Figure 3.8(a) is:
Figure 3.8 Neuron cells in a gradient of doping: a) lumpod reslization, b) distributed renlization
+ & Out1
Univtnity of Windsor
Now, we consider a distributed implementation such as the one shown in Figure 3.8(b).
In this example, each ce11 is implemented with five sub-neurons indicating a 5-input
neuron. Implementation is confined to the same area as of the lumped neurons. In this
case, the maximum distance between the centroid of distributed neurons is effectively
reduced to d, where d D. Therefore, the worst-case threshold discrepancy among the
three distributed cells shown in Figure 3.8(b) is:
The threshold voltage variation. in an ideal case, is reduced by a factor of ( d / D ) « 1
which depends on layout. The improvement is more significant for a network with a large
number of neurons and neuron inputs. Monover, this property can be generalized and
utiiized more effectively on a two-dimensional m a y such as the one in a neural-based
photosensor described in Chapter 6.
In a sirnilar manner, it can be argued that in distributed neurons. variations on other
process-dependent parameters such as p, or CI, would also be averaged out, thus
resulting in uniform characteristics for al1 neurons in a layer of a K S I neural network.
Extra interconnects in Figure 3.8(b) do not create an overhead as such interconnections in
fact do exist in a lumped implementation at the output of synapse cells to form the
summation of synaptic currents. In a distnbuted-neuron implementation each synaptic
ce11 incorporates a densely packed nonlinear sub-neuron, and the outputs of unified
synapse-neuron cells are interconnected in a very similar manner.
The averaging property of distributed neurons was experimentally verified through
fabrication and testing. To demonstrate a worst-case scenario, five-input neurons with the
same circuits as explained in Section 3.2 were first laid out as lumped cells at various
distances on a test chip. Measurements wen perfonned on different cells and repeated
over five fabricated chips. The worst case on-chip variations of the characteristics was
found between two cells at the greatcst distance. The variation was 65mV in EVolt
University of Windsor
range,' i.e. an analog accuracy of 1.3% approximately equivalent to 6 bits resolution. The
distance between the two cells was 2500p. fn Figure 3.9(a) a typical measured
characteristic is shown on the left and a close-up of the worst case curves around SV is
shown on the right.
The advantage of a tnily-distributed neuron cm be observed when the building elements
are distributed in one or two dimensions across the chip. in this manner. an average of
various characteristics is obtained which corresponds to average process parameters.
The characteristic variations between two averaged neurons built in this manner is reduced
to the small variability of neighboring sub-neurons. In the case of our test chips, 5-input
'distributed' neurons were implemented with five sub-neurons laid on a linear array.
Maximum rneiisured characteristic variation was only 25mV. or 0.5% in SV (equivalent to
1 of 7 bits), as opposed to 1.3% ( 1 of 6 bits) for the case of 'lumped' cells. The remaining
discrepancy is mainly related to non-gradient type variations as well as some tolerances on
transistor sizes*
Table 3.4 summuizes the experimental results comparing lumped and distributed neuron
circuits.
Table 3.4. A summary of comparative measurements on lumpeà and àistributed neurohc
Maxim& On-chip Volts in SV 6SmV 25mV Percent 1.3% 0.5%
1. This measurcment result is close to the one ptesented in Section 32.4 under slightly different conditions.
üistribated Neufon and its Ropcrtits Implemmiaiion and Prqmiks of a ûistriited Neumn 47
University of Windsor
Figure 3.9 On-chip variations of characteristlcs (worst case among 5 Bbrications):
a) lumped neuron Mpkmuitation,
b) distributcd neumn implementation
University of Windsor
In this section an interesting property of n distributed neuron, narnely, self-scaling is
introduced through circuit simulations and experimental verification. A brief study is
followed for an intuitive understanding of this property. A more generalized study,
independent of particular circuit implementations. will be presented in Chapter 5.
Different neural network applications require different numbea of neurons and neuron
inputs. When the number of inputs to a lurnped neuron circuit increases, e.g. in a
programmable network. large saturation areas are created resulting in a hard-limiting,
rather than a sigrnoidal, behavior. On the other hand, a dramatic decrease in the number of
inputs to a lumped neuron. effectively results in a low-gain linear neuron function.
Therefore, each neuron should be (re)designed based on the number of its inputs such that
the saturating function is properly scaled over the dynamic range of neuron inputs.
A distributed neuron implementation presents a 'self-scaling' property in this regard.
When the number of synaptic inputs to a neuron (Le. the number of neurons in the
previous layer) increases/decreases, the ovenll nonlinear characteristic is scaled by itself.
The reason is each synaptic input brings a corresponding sub-neuron that incrementally
ûdjusts the overall nonlinear function of the distributed neuron. By properly stretching the
dynamic range of the saturating function, this property restores information received from
new inputs that othenvise would have ken lost in large saturation mas of a Axed lumped
neuron.
Refemng to Figure 3.7, in a distributed neuron with N sub-neuron blocks as current
divides equaily among N similar blocks (each block receiving an average current I,,) the
output voltage can be calcuiated in two alternative ways. If we consider the overall
nonlineûr function ff .) we have.
--
Dùtnbuccd Ncuroa Md iu Roptks implciacnwion and Roperties of a Dutiikiced N e m 49
University of Windsor
On the other hand, regarding each individual sub-neuron block with nonlinear hinction
A(. 1, from Eqn. (3.15) and Eqn. (3.16) we have:
Since the two calculated voltages must be the same, we conclude:
Eqn. (3.22) mathematically defines the function ff.) as a scaled version of the original sub-
neuron functionL(.).
Simulations and experimental measurements were carried out based on a sub-neuron
circuit block similar to the one in Figure 3.2. Figure 3.10(a) shows the simulation results
that confirm Eqn. (3.22) by comparing the characteristics of one sub-neuron block ( N = l )
and a five-input distributed neuron (N=5). Figure 3.10(b) shows the measured
characteristics of a two-input and a five-input distributed neuron fabricated in 1 . 2 ~
CMOS. The self-scaling property can be verified from these measurement diagrams.
The cursor points on Figure 3.10(b) show that a similar output voltage of V,,, = 4.5 V
was obtained with net input currents of 1, = lOOpA for the 2-input neuron and
1, = 250pA for the 5-input neuron. The ratio of the two currents is 2 : 5, Le. the ratio of
the number of inputs to the neurons. The sarne scaling ratio is verified for other
measurement points.
The self-scaling property will be studied in Chapter 5 on a broader view to establish a
stochastic mode1 for distributed neurons. It will be shown that the self-scaling property of
distributed neurons improves the ratio of signai to quantization noise at the outputs of a
programmable hybrid neural nehvork with analog neurons and digitized weights. The
results will confina the intuition obtained in this chapter.
University of Windsor
Figure 3.10 SeEscaling pmperty of the distributed n e m n circuit:
a) simulated characteristics of a sub-neuron and a 5-input neuron,
b) experimentai results comparing a 2-input and a 5-input neumn
33.3 An Increased Fault Tolerance
Neural networks are traditionally known for their interconnection redundancy and thus
fault tokrance. In a neural network with distributed neurons there is a potential increase
in robusmess and fault-tolerance. As each neuron is distributed among N sub-blocks, a
1 VLSI defect would affect only -th of a neuron instead of the whole. An open-circuit N
University of Windsar
fault, for example a broken line, disables one sub-muron from a parailel combination
leaving the remaining (N-1) sub-neurons intact. The resultant neuron would still be
N- 1 operative with a saturating function ealed by - N
. For a moderate or large value of N,
the characteristic would be close to the original. On the other hand, the functionality of a
lumped neuron would be totally destroyed by an open circuit fault.
In case of a shon circuit, both diseributed and lumped neurons would be disrupted
similarly. Moreover, the probability of a VLSI short circuit is essentially the same in both
cases. This probability depends on the number of layout contacts and active area of cells.
Since a lumped transistor with size N. W is broken into N transistors with size W for a
distributed implementation, the nurnber of contacts and the active area remain the sarne.
3.4 Conclusion
In this chapter, a simple 1-to-V neuron circuit was presented that relies on inherent
nonlinearity in 1-V characteristics of NMOS and PMOS transistors to approximate a
saturating hinction. The saturating function was analytically studied and a sensitivity
analysis was carried out. Circuit analysis and simulations both suggested an interestingly
low 3% characteristic variation despite 20% variations in threshold voltage parameter.
Experimental measurements proved to be even more promising. Maximum chip-tochip
variations from 10 fabrications was 2.2%, white worst-case variation between neurons
within one chip was 1.3%. Moreover, a distributed-neuron implementation further
reduced the variations within one chip, creating vimially unifom neurons with measured
variations at or below 0.5%.
Other properties of a distributed neuron circuit namely a self-scaling characteristic and
improved fault tolerance were described. In particular, the self-scaling property was
exploced through circuit simulations and fabrication measuremcnts. A generalized
discussion on this subject is presented in Chapter 5.
Distributed Nturoa nad itt Ptopcnitr Conclusion 52
Chapter 4 A Universal Hybrid
Block for NNZCs
4.1 Introduction
This chapter describes the design and characterization of a hybrid
analog-digital1 circuit presented as a universal building block for
the implementation of multilayer feedforward neural networks.
The universal block is based on a mixed-signal multiplier and a
distributed neuron design, the latter one described earlier in
Chapter 3. A special property emerging from a hybrid distributed-
neuron realization will be snidied in depth in Chapter 5.
Circuit simulations, fabrication test results and design
improvements especially on an MDAC-type synapse are presented
fi rst. The application of the proposed building block in the design
of a few Neural Network Integrated Circuits (NNICs) will be
described aftenvards. One of these NNICs, namely r neural-based
smart photosensor. will be discussed in more details in Chapter 6.
1. nie tenns 'Viybrid ", "ltybrr'd anulog-digital " and "mked-sigd " are used interchangeably.
A Univemû Hybrid Block for NNKs Introductim 53
University of Windsor
4.2 A Programmable Universal Hybrid Building Block
The architecture presented in this thesis implements neural networks with regular arrays of
a programmable universal hybrid building block. This block, in essence, is a
"nonlinearly-loaded mixed-signal multiplier" consisting of the following sub-blocks
shown in Figure 4.1: a) a Multiplying Digital-to-Analog Converter (MDAC) synapse;
b) a digitally-programmable weight register; c) a nonlinear load or sub-neuron. In each
universal block, a synaptic weight can be stored digitaily in a 5-bit ReaWrite (R/W)
register. Synaptic multiplication is performed by an MDAC circuit. A nonlinear resistive
sub-neuron loading the output of each MDAC, converts the synaptic output current to a
voltage. Several distributed sub-neurons from different blocks collectivel y perform the
function of a sigrnoidal neuron (cf Chapter 3, Section 3.3).
Figure 4.1 Sub-blocks of the universal hybrid building block
Vin \
u Weight
Nonlinear Su b-neuron Reglster
4a2.1 Multiplying D-to-A Converter (MDAC)
MDAC is a mixed-signal block that produces an output current proportionai to the
multiplication of an analog input voltage by a signed digital synaptic weight:
--
A U n i w d Hybcid Block f i NNICs A hgmmmaûlc U n i d Hybnd Building Blodr 54
University of Windsor
Analog voltage y,, is received from a neuron in the previous layer, and output current I,,,
represents the synaptic activity of MDAC-type synapse. MDAC consists of three sub-
circuits as shown in a conceptual block diagrarn in Figure 4.2(a). These sub-circuits are:
1) a voltage-to-current converter (V-to-I); 2) a set of binary-weighted programmable
current miron; 3) a sign-bit circuit. The output current of MDAC, refemng to Figure
4.2(a), can be expressed as:
0 4 is the sign bit of the digital weight thn sets the direction of output current: D4=0
creates a positive (ercitatory) synaptic current while D4=1 sets a negative (inhibitory)
current. The synaptic current magnitude is determined by binary-coded weight bits 0 3 to
DO multiplied by V,. Coefficient K in Eqn. (4.2) is a constant in Ohms (R) mainly
determined by the V-to-I converter. Figure 4.2(b) shows the circuit diagram of the initial
MDAC; a five-bit version of 1571. Ail three sub-circuits are modified as will be explained
next and shown in evolutionary steps in Figure 4.2 parts (b) to (e).
In Figure 4.2(c), the sign-bit circuit is greatly simplified such that it only requires 0 4
input. insteûd of both 0 4 and 04. saving an inverter or an extra interconnection line per
synapse. Each saving related to synaptic circuits is important due to the great number of
synaptic blocks and interconnects that cm eventually occupy a considerable die area. The
modified sign-bit circuit consists of only four transistors that produce a bi-directional
output current. When 04 is High (i.e. a negative weight) the three PMOS transistors in the
sign-bit circuit are OFF and NMOS is ON that sinks the total binary-weighted cumnt
from the output terminal. When 04 is Low. NMOS is OFF and the thne PMOS devices
are ON acting as a current mirror that will source the binary-weighted current I,,,, to the
output. Table 4.1 summarizes the device widths and lengths (R( L) in MDAC circuit of
Figure 4.2(c) implemented in 1.2pm CMOS. A layout technique known as "AW
correction" is used in the binary-weighted curnnt minor for a better device matching [l].
A Univemû Hybrid BIack far NMCI A ROgmmmôk U n i d Hybrid Building Block 55
University of Windsor
Figure 4.2 MDAC-type synapse, evolutionary steps:
b) a 5-bit version based on [57],
C) modification in sign-bit circuit,
d) modiflcatioa in current mirmrs,
e) modiscation in V-to-1 converter
A Univtd Hybrid BI& for NNlCs A Ibgmmmble Univasai Hybrid Buüding Block
Vin 1 1 I 81 t 4 1 t 2 1
V - to 1 -
Binary-weighted Programmable Current Mirrors
- 0nd D3 02 01 DO
University of Windsor
Figure 4.2 Continued
A u n i d ~ y b t i d ~ t o d ~ OC NNICS A ~ogrammab~t UM ~ybnd ~uiiding BI& n
University of Windsor
- - --
Figure 4.3(a) shows the output current of MDAC with modified sign-bit (the circuit in
Figure 4.2(c)) at maximum input (V, =5V) as binary weight increases successiveiy from
-15 to 15. Figure 4.3(b) shows fabrication measurement results that are in close
agreement with simulations. MDAC operates, within a 3% linearity margin, as a weight-
dependent current source with a nominal output cumnt of - LOOpA to +lûû@.
Figure 4 3 MDAC output current (at bn--) vs. binary weights:
a) simulation waveforms, b) fabrication measurements
A U n i v d Hybrid Blodt for NNlCs A Rogpunmnbk Univecd Hybrid Building Block 58
University of Windsor
Table 4.1. Dcvice sizes (WL) in pn for MDAC circuit shown in Figure 4.2(c)
1 Ml-Mla: 3.2 n.0 1 M2: 2.4 f24. 1
The next modification is shown in Figure 4.2(d) in which a row of transistors are removed
from the binary-weighted current mirror and V-to-1 circuit. This modification: a) makes
headroom for a lower supply voltage (3 .W instead of SV) as we avoid stacking up
transistors; b) reduces the input dead zone of each synapse from 2V, to V, (see Figure
4.4(a)), where V, is the threshold voltage of a diode-connected NMOS transistor in V-to-1
circuit. The modification is crucial. especially in the design of a low-voltage low-power
cell. The penalty is a nduction in output impedance of the cumnt mimors. This is
compensated, to some extent, by transistor resizing (e.g. by using current minor
transistors with longer channels), and the nst only drops the output current of MDAC
slightly under nonideal load conditions. The modification is well justified at system level
where we enjoy low-voltage low-power cells with inc~ased dynarnic range.
Figure 4.4(a) shows the transfer characteristics of Io,, vs. V, (at maximum weight) for the
circuits shown in Figure 4.2(c) and Figure 4.2(d). The dynamic range of operation has
k e n appmntly increased in the latter circuit compand to the former one due to the
removal of one threshold voltage element in wto-l circuit.
In its linear region, the characteristic of Io,,, vs. V, relies on a long-channel NMOS
transistor (M2) of V-to-1 converter in triode region. On the upper end of the characteristic
when input voltage becomes too large. M2 enters the saturation region and output current
goes nonlinear or eventually saturates. This is evident for IOut2 in Figure 4.4(b). To
linearize the characteristics at high end. a long-chanml PMOS is added to V-to-1 circuit in
parallel with the NMOS transistor already in place. An alternative solution, seemingly
A Univemai Hybrid Block for NNICs A RogrYnmoblc Univemi Hybnd Building Block 59
University of Windsor
impractical, would be to tie up the gate of M2 to a dc voltage VGG higher than VDD. The
tinal MDAC circuit after the modification of the V-to-1 converter is shown in Figure
4.2(e). The irnproved transfer characteristics are shown for Iouul in Figure 4.4(b)
compared to
Figure 4.4 Improving the dynamic range of I,, vs. V, in MDAC:
a) threshold duct ion (circuit improvement from Fipre 4.2(c) to Figure 4.2(d))
b) linearization (circuit improvement from Figure 4.2(d) to Figure 4.2(e))
A U n i d Hybrid Block for NNICs A PmgmmW U n i d Hyùrid BWlhiag Block 60
University of Windsor
4.2.2 Weight Register with Double-Phase Clock
The synaptic weight is stored in digital form in a static ReadWrite register integrated with
each universal block. The universal block and a neural network built with this block are
thus programmable afier different training sessions. Digital weight storage generally
occupies a large siiicon area in a neural network chip. For this reason, an area-efficient
memory with double-phase clock is custom designed, instead of using standard library
cells (latch or memory).
Figure 4.5 shows the schematic of one bit of a 5-bit weight register. Each single-bit ceIl
consists of MOS switches and three inverters, one in a switched feedback configuration.
After a reset pulse (aReseJ, a double-phase non-overlapping clock (e1-Q2) drives intemal
MOS switches and stores input data in the cell. A 5-bit parallel-in parallel-out register of
this type has been used for the programming and stonge of a sign-magnitude synaptic
weight. Similar units are used for the storage of threshold (bias) vaiue of neurons.
Nominal programming clock speed is 3.5MHz; however, the cells cm operate at higher
speeds up to an order of magnitude. An area swing of over 30% has k e n achieved for
each 5-bit storage unit compared to the most compact ce11 library option available in the
target CMOS technology.
Figure 4.5 Schematic of one bit of a 5-bit weight storage ce11 with double-phase cïock
A U n i d Hybrid BI& fw NNICs A Rogrcunmoblc U N d Hybrid Building Block 61
University of Windsor
43.3 Characteristics of the Uaified Synapse-Neuron Circuit
The schematic diagram of the unified synapse-neuron (USN) circuit is shown in Figure
4.6. The USN consists of an improved MDAC (as explained before and shown in Figure
4.2(e)), loaded by a nonlinear sub-neuron. The sub-neuron is an element of a distributed
sigmoidal neuron circuit discussed earlier in Chapter 3. When a weight register is
integrated with a USN, they create a programmable universal building block for NNICs.
Experimental output current of the fabricated MDAC before the introduction of nonlinear
load (Le., measured through a linear resistive load tied to VDDR) is shown in Figure 4.7.
In this experiment, the digital weight was successively increased from -1 5 to 15, while the
input voltage was set at Vi,,, = X The measured stair-case current in 194pA range
characterizes the modified MDAC of Figure 4.2(e) without the effect of a nonlinear sub-
neuron. When a sub-neuron is introduced at the output. it converts the stair-case current
into a discrete sigmoid-like voltage. Simulation results in Figure 4.8(a) show the overall
characteristics of the USN for two parameter values of V, = 2V and 5V: Figure 4.8(b)
shows the measured output voltage characteristic of USN at Vi,.,, =5K Fabrication
measurernents were in a close agreement with the simulations.
Figure 4.6 Unüied Synapse-Neuron (USN) circuit
p 'out
Nonllnear load
A U n i v d Hybrid Block for NNICs A Prognmmoble Univami Hybrid Building Block 62
Univenity of Windsor
Fipure 4.7 Output current of the modifiecl MDAC (Figure 4.6 or Figure 4.2(e))
Y) 1 0 ., O .i œ N rl
Y) rl 1 r(
I Synaptic Weight
Figure 4.8 Overail characteristics of USN:
a) simulations for two parametric values of V,. b) experimentd measumments at hM., = SV
li.iiii.iii.iii*irii.I.*(i.iiii.iii.iiilirii.I...i.iriil(i.iiii.iii.iiilirii.I...i.iriil(i.iiii.iii.iiilirii.I...i.iriil(i.iiii.iii.iiilirii.I...i.iriil(i.iiii.iii.iiilirii.I...i.iriil=-LL'l
-15 O +15 Weight
- --
A U n i d Hybrid BI& for NNlCs A Rognunmablc Univcd Hybnd Building Block 63
University of Windsar
4.3 Applications
4.3.1 An Opticd Template Matching Network
A block diagram of a 4-3-2 hybrid VLSI neural network based on a hybrid distributeci-
neuron architecture was shown earlier in Figure 2.3(b) in Chapter 2. As a pmof-of-
concept design, a complete circuit for this network has ken implemented based on arrays
of the universal hybrid building block presented in this chapter. A schematic diagram is
shown in Figure 4.9. The network is built with 23 blocks. Eighteen of these blocks are
the universal nonlinearly-loaded multiplier blocks. Five remaining blocks are used for
neuron threshold adjustment. These units are in fact the same universal blocks on silicon
with their nonlinear loads deactivated.
Figure 4.9 A 4-3-2 VLSI neural network based on arrays of a universal hybrid building black
A U n i d Hybnd B l d for NNlCs Applications 64
University of Windsor
The network is trained for a 4-input template matching problem. An interactive Back-
Propagation simulator is developed by which the network and the patterns are defined for
an off-line training. The simulator will be explained in Chapter 6. While training is
perfonned with high precision weight values defined in software, the resulting weights are
rounded off to the resolution of the hardware (5 bits) and a simulated recall is followed.
When this final phase is passed. the weights are prognmmed on chip. Figure 4.1O(a)
shows four input templates learned earlier by the network during training. The inputs are
originally optical. but can also be applied to the chip ûs electronic signais as shown in
Figure 4.10(b). Templates 1 and 2 are to be detected and flagged on two outputs while the
other templates are to be rejected. The corresponding trained weights (w 's) and bias
values (b 's) refemng to the circuit in Figure 4.9 are:'
The circuit is designed, laid out and simulated using Cadence tools and HSpice. A chip is
fabricated in 1 . 2 ~ CMOS (see the layout in Figure 4.1 \(a)) that contains two venions of
this network: a) a weight-programmable network with electronic inputs; b) a network with
photosensitive inputs and pre-programmed weights for an optical template matching
application. The complete circuit layout consists of about 700 MOS transistors. Figure
4.1 1 (b) shows a microphotograph of the core area of the chip. Four photosensor cells
forming a square are used as optical inputs to the neural network. A fifth photosensor at
the middle, directly connected to an output pad, is used as a refennce to adjust the
sensitivity of the cells to background illumination. Physical dimensions of the chip are:
(3030 x 28 1 6.8)pn2, or 8.5mm2, including bonding pads, ESD and sorne test structures?
1. Supers@< 1 in b% and w'" refers to the layer numkr (cg. 1 = 1 , 1 = 2).
2. This chip is fabricated ttirough the Canadian Microelectronics Corporation (CMC) under the design nome W W . In CMC's fabrication record, chip dimensions are in design scale rnicmns (DSM). One DSM in 1 2pm CMOS4S technology is qua1 to 0.8 physical microns.
A Univerd Hybrid BIock for NMCI Applicati~l~~ 65
Unimsity of Windsor
During hardware recall, electronic input vectors were applied to the network as shown in
Fipre 4.10(b) at a typical rate of 1M.Vector.sec. Outputs 1 and 2 became active in
response to templates 1 and 2, respectively, as shown in Figure 4.10(c). Both outputs
remained zero for templates O and 3. Current and power consumption on V D D = SV
were: I , , = 730pA and Pave = 3.65m W . The circuit can operate at higher speeds or on
lower supply voltages. For instance, with V D D = 3.3 V. current and power consumptions
were: 1,,, = 155pA and Pa,= 5lOpW which is, on average, less thm 0.75pW per
transistor. This result indicates an 86% power reduction compared to the consumption on
a 5-Volt supply.
Fipre 4.10 Template matching: a) optical inputs, b) equivalent electronic inputs, c) chip outputs in maU
University af Windsor
Figure 4.11 a) Layout and b) core micropbotograph of temphte matching NMC
P b b 4.2. A &ta summary about template~matchlng NNIC
Function 1 Programmable template matching 1 1 Inputs 1 4 Optical + 4 Electronic (
-- -
m i c a l rate 1 1 - 2.SM.Vectors/sec.
I Dimensions 1 (3030 x 28 16.8)p2 1 r - Die area 1 8 .smm2 1 1 Package 1 68 Pin Grid Amy 1 1 Powet consumption on SV 1 3.65mW 1 1 Power consumption on 3.3V 1
-- - -
A Universai Hybnd Block fa NNlCs Applicoacoa~ 67
A low-voltage operation on 3.3V was possible due to a modified MDAC circuit shown
earlier in Figure 4.2(e). Simulations indicate that a similar neural network architecture
based on the original (cascode) MDAC circuit would fail to ncall the stond patterns when
operating from a 3.3V supply. This was caused by information loss in large threshold
zones of MDAC-type synapses.
The chip was also tested successhilly with optical input patterns similar to those in Figure
4.1 l(a).' In a long-terrn testing, the hnctionaiity of the chip was confirmed over a
48-hour period of continuous work. A data summary about the template-matching chip is
given in Table 4.2.
Programmable synaptic weights allow the NNIC to recognize different templates to be
detected in different applications. A particular application of interest was feature
extraction in a handwritten numeral recognition system. The system consists of three
stages: preprocessing, feature extraction and classification [6]. As each numeral has a
unique directional histogram, directional border templates become the features to be
extracted by such hardware (71. The basic process performed on a typical handwritten
number and the 2 x 2 directional ternplates programmed on NNIC for feature extraction
are illustnted in Figure 4.12. The use of a programmable neural network in feature
extraction stage allows a higher flexibility and a possibility of merging with neural
network classifier stage.
-- - -
1. A hardware demonsûation of the functionality of this chip was presented at TEXP0'97 [251.
Univcnity of Windsor
Figure 4.12 a) Border teahire extraction and b) àitectional templates for handwritten numeral recognition
Ternplate Direction Code
4.3.2 General Purpose Programmable Neural Network Classifier
Fipre 4.13 shows the microphotograph of a 16-4-3 programmable neural netwotk IC
designed for general purpose vector classification applications.' Fabricated in 1.2pm
CMOS, the chip is built with mixed-signal arrays of totally 83 programmable universai
building blocks and is tested for the mapping of up to 16-bit input vectors to vecton of 3
analog components at the output. In order to avoid an I/0 (pad) bounded layout and pin
limitations. a I6-bit serial-in parallel-out (SIPO) interface is integrated at the input.
Compact memory cells with 2-phase clock such as those described in Section 4.2.2 are
used in SIPO interface as well as for the programming and storage of synaptic weights.
Weight programming and test vector generation were performed using HP75000-D20 Test
System. A data summary about this chip is given in Table 4.3.
Due to a modular architecture based on a universal building block: a) the time and effort
spent for a custom layout were gnatly reduced, and b) a dense regular silicon
impkmentation with low interconnect area and complexity hm k e n achieved. In fact, -- --
1. The design is fabricatd through the Canadian Micmelectronics Corporaiion undtr the name WRPW.
A Universiil Hybrid Bloc& fm NNICs Applications 69
University of Windsor
synapse-to-neuron interconnections are made locally inside universal blocks and the
remaining (global) routing is perfonned on a highly regular structure. in the target
numeral recognition system [6], two such NNICs are to be used in parallel in classification
stage to map the extracted features of directional border templates into expected classes.
Table 4.3. A daîa surnmary about programmable NNIC classifier
1 Function Programmable NN Classifier ( Architecture -1 -
1643
1 No. of programmable units 1 83 x Sbit
1 - Dimensions 1 (2872.8 x 2 172.5)jm2 1 Die Area - Package 1 68 Pin Grid Array 1
1 Power on SV supply 1 10.8rnW 1 Power on 3.3V suppiy 1 1 S m W
Figure 4.13 Microphotograph of a general purpose 164-3 programmable NNIC classifier
A U n i d Hybrid Bloclt for NNfCs A p p l i d w t 70
University of Windsor
4.3.3 Other NNIC Fabrications
Another NNIC which is designed and fabricated based on the presented architecture and
building blocks, is a neural-based smart photosensor. In the context of this design, a
considetable reduc tion in interconnec tion area and a corresponding incnase in sy naptic
density is highlighted in cornparison with a conventional implementation. Details will be
explained in Chapter 6. Moreover, a BiCMOS version of the proposed circuits and a
neural-based photosensor in that technology has been implemented [5 11.
4.4 Conclusion
A programmable nonlinearly-loaded mixed-signal multiplier was presented in this chapter
and in [17] as a univenal building block for the implementation of NNICs. The block
consists of an MDAC-type synapse, an element of a distributed neuron and a compact
weight register. Circuit design and improvements were described, and simulation and
experimentd results were presented in close agreement. Circuit techniques in MDAC
especially increased the dynarnic range of synaptic function and made possible an
operation on a 3.3-V, as well as a 5-V supply.
' b o NNICs were fabricated and tested as a proof-of-concept for the architecture and
building block. The first test chip was a 4-input template matching neural network with
both optical and electronic inputs. The second chip was a 16-4-3 general purpose
programmable neural network classifier. A low-voltage operation on 3.3V, rather than SV,
reduced the power consumption by 868. The two NMCs c m , nspectively. be used in
feature extraction and classification stages of a target handwntten numeral recognition
system.
A Universai Hybrid Block for NNlCs Conclusion 7 1
- -
Chapter 5 Quantization Noise
5.1 Introduction
The main purpose of implementing a neural network on hardware is
to realize a true Parallel Distributed Processor. Hardware
implementations. however, introduce various non-idealities such as
weight quantization effects and variations of characteristics.
Moreover, in order to realize dense and high-speed neurai networks
with a large number of neurons for reai world applications, the use
of simple synapses and neurons with low precision weights and
other non-idealities is unavoidable. The effect of weight
quantization especiaily becomes more apparent at the outputs when
the network becomes larger.
A statistical analysis is carried out in [93] on the effect of
quantization in multilayer neural networks. This andysis considers
relatively srnall networks in both learning and mal1 phases. The
number of quantization bits required is fairly high (8-10 bits)
because of the requirements of leaming phase. However, a different
appmach with a lower bit remlution can be taken, if we are only
concemed about quantization effects ofter leaming, i.e. the
implementation of an ideaily-trained network. Sensitivity to weight
emns of neural networks with increasing number of neurons is
- - - - - - -
QuaDtiPa'on Nok ünprovemcat tnüduaion 72
University of Windsor
analyzed in [68]. A stochastic mode1 is developed to study an ensemble of networks with
diRering weights and the focus is on the implernentation of recall phase. In this chapter,
we build on this model and present a new model for an Adaline with distributed-neuron
structure. This model predicts a lower quantization noise in a hybnd distributed-neuron
architecture compared to a conventional lumped-neuron architecture.
5.2 Modeling a Distributed Neuron
In this section. the analytical models of a lumped and a distributed resistive neuron are
studied without reference to any specific circuit implementations. As a result of this study,
an important property of a distributed neuron, namely self-scaling, will be fonnulated.
Two basic tasks of a neuron are summation of synaptic inputs, and nonlinear saturating
function. A third task, Le. neuron threshold adjustment, can be modeled by an extra input
synapse. Synapses in their simplest fom are modeled as ideal multipliers. In a neural
network built with tramconductance (V-to-1) synapses and resistive-type neurons,
sumrnation is simply performed by hardwiring the output currents of synapses together.
An Adaline with lumped resistive neuron c m be realized as depicted in Figure 5.1. In this
case, a lumped resistive neuron receives the summation of currents and generates on the
same supernode an output voltage which is a nonlinear function of total synaptic input.
Figure 5.1 An Adrliw implemented with lumped resistive-type neuron
University of Windsor
A nonlinear resistive-type neuron cm be distributed into parallel sub-neurons assuming
their collective response remains the same. If the number of sub-neurons are chosen to be
equal to the number of input synapses, each sub-neuron and a corresponding synapse can
create a unified synapse-neuron (USN). In this thesis an architecture based on a hybrid
analog-digital USN has been presented. The architecture is comprised of digitized
synaptic weights, andog distributed neurons and multiplying DIA synapses. An Adaline
based on this architecture is shown in Fiare 5.2 and is modeled in the rest of this chapter.
We will fomuiate the self-scaling property of distributed-neuron Adaline and show that in
a 'hybrid' architecture that relies on quantized weights, this property reduces the effect of
quantization noise at the output.
Figure 5 3 An Adaline with a disttibuted-neuron architecture
5.2.1 Increase in the number of Adoline inputs
An increase or decrease in the number of neuron inputs is cornmonplace. For instance, a
change in the number of nodes (or neurons) in a layer brings a comsponding change in
the number of inputs to al1 neurons in the next layer. Such situations are inevitable, e.g.
when a programmable neural network chip is used in diffeient applications, or when in
University of Windsor
cascadable chip sets new synaptic modules are added in input or hidden layer. This may
also be the case in a leaming process involving network topology modification [13].
The output of an N- input neuron with activation function f (.) is:
Different fan-in conditions may occur for a neuron when a programmable network is used
in different applications. This is also the case when the same neuron circuit (cell) is used
in different layers of a neural network with different number of synaptic inputs.
Real world applications especially require neural networks that have neurons with a large
number of inputs, while in most cases it is difficult to detennine the exact number of nodes
and hence the fan-in conditions of neurons kforehand. In generai, if the number of inputs
to a lumped neuron is increased by a factor of S (not necessady an integer number), then:
A saturating hinction initially designed for an N-input neuron is shown in Figure 5.3(a).
The horizontal axis represents total synaptic input, also known as net input. The same
neuron function when the number of inputs is increased to N.S is illustrated in Figure
5.3(b) that involves a larger dynarnic range of net input. For S >> 1, output function y,
will contain large saturation areas and a n m w transition region compared to the whole
input dynamic range. Quantization noise is mon amplified in a seemingly sharper
transition region, while the information canying signal soon saturates and, therefore,
signal-to-noise ratio deteriorates. In principle. the neuron should be ce-designed such that
its saturating function properly spreads over the dynamic range of net input. A propedy-
scaled neuron characteristic for new input conditions is illustrated in Figure 5.3(c).
Re-designing a neuron is. apparently, neither convenient nor in somc cases possible.
University of Wtndsor
Figure 5.3 a) An N-input neumn characteristic over the original range of inputs, b) the same neunni when inputs are i n c d to N.S,
c) a properly-scdd neuron with N.S inputs
Neumnls y. = f (Z w,& .xi) Output k =t
N.S
Neuron8s A y,, = f (C w d .x l ) Output k = l
rn w
rn 1 w
m w
-S.I0 S.1 Synaptic lnput
yn = F (2 w n k * ~ k ) = F (lm ) output k=l
University of Windsor
5.2.2 Self-scaiing Formulation
When the number of inputs to a neuron increases, one method to avoid over-saturation,
besides n-designing, is to reduce synaptic activity by scaling down synaptic inputs or
weights. A weight scding rnethod such as the one proposed in [92] is only practical for
software implementations, and becomes too complex in hardware as it requires one
scaling module for each synaptic weight. If we choose S as the scaling factor on synaptic
weights, we have:
N.S A where Wnk Jk = I W ~ is net input before weight scaling.
Equivalently, we should be able to use the same net input combined with a properly scaled
neuron activation function F ( , ) :
A scded sigrnoidal function, for example, is defined as:
dF in which -
dl, ,
by a factor of S.
9 9
1 - ~ ~ ( 0 ) 1 = - , i.e. slope gain at the origin is decreased
r,, =O S S
University of Windsor
Figure 5.4 Neuron input increase for a distributed aeuron
Therefore, scaling dl synaptic weights with 1 / S is equivalent to using the same set of
weights combined with a scaled activation function defined as above. The distributed-
neuron structure presents a scaling property similar to the above scheme, Le. if the number
of input synapses to a distributed neuron is increased by a factor S. the neuron will consist
of S similar nonlinear blocks in parailel (each possibly consisting of Nresistive sub-
neurons). Refemng to Figure 5.4, as current equally divides arnong the similar blocks,
the output voltage can be obtained in two alternative ways:
represents the nonlinear huiction of the original IV-input neuron block, and Ime. is the
current through each and every sub-neuron. Thus, from (I) and (II) we conclude:
QuMtuPn'm N o k ünpc0~#11~111 M-ng m DWhîcd Ncumn 78
The basic self-scaling property of a distributed neuron is described by Eqn. (5.5).
According to this equation, a distributed neuron exhibits a self-scaling property that is
equivaient to scaling down all the weights proportional to increase in the number of input
synapses. This property will be used in the next section to transfomi a statistical model of
a conventional (lumped) Adaline to a new mode1 for a distributed-neuron Adaline.
Intuitively speaking. as the number of synaptic inputs (Le. the number of neurons in
previous layer) increases. the overall nonlinear characteristic of a distributed neuron
automaticaliy stretches and proportiondly coven the entire dynamic range of inputs. This
property of a distributed neuron preserves the information received from extra synaptic
inputs that would have k e n lost otherwise in large saturation areas of a fixed lumped
neuron with increasing nurnber of inputs.
5.3 Stochastic Mode1
5.3.1 Sigrnoidai Adaüne with Lumped Neuron
A stochastic model pnsented in [68] defines the ided output of an Adaline (in an arbitrary
layer of a Madaiine) and the comsponding output error as follows:
where X* stands for the transpose of matrix X, and W, , X. AW, and are
independent identically distributed (iid) random vecton representing weights, inputs,
weight emrs and input errors, nspectively.
University of Windsor
The output Noise-to-Signal Ratio (NSR) of the Addine is defined as the ratio of the
variance of the output error (due to quantization noise, etc.) to the variance of the ideal
output (e.g. due to diffenng weights corresponding to different training sets):
O'A~ NSR = 7
Based on this model, the output NSR of a sigmoidal Adaline c m be cxpressed as a linear
combination of input NSR, a2k / 02,, and weight NSR, 02Aw / 6 2 w , and is
amplified by a stochastic gain function g(.):
The output NSR of an Adaline in an arbitrary Iayer of a Madaline can be computed
recursively starting from the input layer. Stochastic gain g(.), is aiways greater than 1 and
is an increasing function of its argument, %,O, ; where Nis the nurnber of inputs to
Adaline, and 0, and 6, are standard deviations of inputs and weights, respectively.
Increasing the number of inputs to an Adaline increases its stochastic gain, g(.) [68].
Therefore, in a conventional neural network with lumped sigrnoidal neurons, an increase
in the number of inputs to different layers causes an unwanted increase in the output NSR.
If the nurnber of inputs to an Adaline incmases by a factor S and input and weight
variances remain the sarne, in the absence of any scaling scheme the output Noise-to-
Signai Ratio of a lumped Adaline will incnase to NSRl given by the following equation:
University of Windsor
In other words, the effect of weight quantization becomes more apparent at the output of a
larger network.
5.3.2 Sigmoidal Adaline with Distributed Neuron
If we manage to reshape the nonlinear characteristic of a neuron in an Adaline in response
to an increase in the number of inputs. we will be able to control the stochastic gain factor
g(J, and hence the Noise-to-Signal Ratio. In the case of a lumped neuron. this should be
done by re-designing the saturating function. On the other hand, the self-scaling property
inherent in a distributed-neumn structure presents a natural way of controlling g(.).
Formulated in Eqn. (5.5). this property in tum affects the NSR of a distributed-neuron
Adaline. The impact of self-scaling on NSR is explored here.
In a distributed-neuron structure. according to Eqn. (5.5) every weight to Adaline is, in
effect. scaled by 1 / S as the number of inputs increases. If we define w~ = w / S as a
scaled weight, then the statistical parameters for the scaled weight are:
2 2 2 c 2 w S = d w / S and o 2 b w S = O AW / s2 . Thercfore, NSR2 or the Noise-to-Signal
Ratio of a distributed Adaline with increased number of inputs will be:
The terrns in linear combination nmain unchanged, while the stochastic gain is decreased
due to the scaling of its argument with 1 / 6. This property will reduce Noise-toSignal
Quantiauion Noise üupmvcmcnt Stochastic Mode1 8 1
University of Windsor
Ratio, NSR,, and improves the performance of recall hardware proportionally. The
resulting stochastic model is depicted in Figure 5.5. The improvement is demonstrated by
an exarnple in Section 5.4.
Figure 5.5 Stochastic model for an Adaiine with distributed neuron
5.4 A Case Study
For an Adaline with N = 25 inputs, suppose inputs and weights are unifomly distributed
2 aver the range [a, b] = [-2.21; thenfore, a21 = a2w = (b - a) 112 = 4 1 3 .
Assuming an 8-bit quantization scheme. weights an quantized to levels equdly spaced by
q = 1/ 64 ; thus, weight emr variance will be C * A ~ = q2 112 = 2 x 10" .
Furthemore, we assume a2h = O as in this discussion we are interested in the net effect
of weight quantization only. For fi- > 2 , gain hinction defined in [68] may be
approximated as:
University of Windsor
The output noise-to-signal ratio of the 25-input Adaline is then:
2 u2hr 0 Aw
NSR = g ( J N ~ , ~ , ) X (-r + = g(6.67)-(1-5 X ) d x w
Now, if the number of inputs is increased to 100 (an increase by factor S = 4 ), then:
(1) For a conventional (lumped-neuron) architecture, NSR cm be found from Eqn.
(5.10) as,
NSR, = g(13.3).(1.5 x 10") = 11.3 x 10" = -39.5 dB.
(mu, instead, we use a distributed-neuron architecture, then after the input
increase, from Eqn. (5.1 1 ) we will have:
NSR ,= g(3.33).(1.5 x 10") = 3.4 x 10" = -44.7 dB
The difference between the Noise-to-Signal Ratios in case (1) and case (II) is:
in this example, NSR is reduced almost by a factor of 3, or 5.2 dB in decibel ternis. In
other words, Signal-to-Noise Ratio (SNR) is increased by a similar factor. The
improvement would be even more noticeable for larger input increase factors. Figure 5.6
QunntizPn'm Noire lmprowmcnt A Casc Study 83
University of Windsor
shows the improvement in SNR for different values of N (initial neuron inputs) and S
(input increase or scaiing factor) based on our simulations.
Figure 5.6 Signai-to-Noise Ratio improvememt vs. input incmase factor
I NSR2- NSR, I ( dB )
1 2 3 4 5 6 7 8 9 1 0 S : Input lncrease
5.5 Discussion and Conclusion
Nonlinear circuits based on square-law characteristics of MOS transistors are used to
implement distributed neurons as explained in Chapter 3. Each nonlinear sub-neuron is a
compact circuit hsed into a multiplying DIA synapse. the latter one generates a current
proportional to the product of an analog input by a digitized weight (see Figure 5.2).
Each sub-neuron presents a nonlinear characteristic which is designed to cover the
dynamic range of one synaptic input. As a nile of thumb, we found it suitable to have
roughly 30% of input dynamic range of each element in low saturation region
-
Qrwntizotion Noise lmprovtment ûi-*on d Conclusion 84
University of Windsor
( y ) some 40% of input dynamic range in transition region
( 0.1 yn-,= < yn < 0.9 y,-,,), and the remaining 30% of input dynamic range in high
saturation end ( Yn > 0.9 yn-,= ). When a nonlinear sub-neuron is integrated with every
synapse. the above-specified shape will be proportionally preserved for an N-input neuron.
regardless of the number of inputs. The resulting unified synapse-neuron blocks present a
highly modular and scdable solution for the design of VLSI neural networks with
differcnt sizes in diffennt applications and has ken successfully used in the
implementation of programmable neural network classifiers as described in Chapter 4.
In conclusion, a stochastic model was presented for the first time for an Adaline with
distributed neuron implementation. The self-scaling property of a distributed neuron was
formulated in this chapter and applied to transform an existing model for a conventional
(lumped-neuron) Adaline to the one presented for the tint time for a distributed-neuron
Adaline. Based on the presented stochastic analysis and simulations, the ratio of signal to
quantization noise increases considerably for large number of neuron inputs (or nodes per
Iayer), when a programmable neural network hardware is based on a distnbuted- rather
than a lumped-neuron architecture.
A main conclusion in 1681, 1691 is that increasing the number of nodes per layer in a
(conventional) Madaline increases the required weight accuracy given a maximum
allowable noise-to-signal ratio. In this chapter, it was shown that a distributed-neuron
architecture is advantageous in terms of maintaining a better signal-to-noise ratio as the
number of neuron inputs (or nodes per layer) increases. The larger the network becomes.
the more apparent the S N R advantage is, compared to n conventional Madaline network.
A final note is that a higher SNR in a distributed-neuron architecture can be traded off at a
certain level with a lower bit precision. Depending on network topoiogy, every 5 to 1Od.B
difference in SNR (6 dB on an average sense) is equivalent to one bit difference in weight
precision.
1. In neuron circuit y,,,, is ihe same as supply voltage (SV typicd in our target CMOS techno1ogy).
Qiiitotiznr-on N o k lmpmvtmnt ûimssioa aad Conclusion 85
Chapter 6 Neural-based
Smatt Photosensor
6.1 Introduction
In this chapter the design of r neural-network-based smart
photosensor for focal-plane pattem classification is e~~1ained.l
These sensors are designed for on-line pattem classification
applications requiring image capture or non-contact measurements.
We fint review previous work on CMOStompatible
photoreceptors and neural-based smart sensors. The author had r
chance to contribute to some aspects of the earlier designs of these
sensors in VLSI Research Group, University of Windsor, including
fabrication submission and testing, as well as design transition to
newer CAD environments. Through this experience, a valuable
insight was obtained and the main problems were identified.
Two main issues in the design of out target neural-based
photosensor are about photosensor array and neural network
architecture. The two issues are first discussed separately in this
chapter. From this study the type of photosensor elements and the
architecture of neural network classifier will be: determined, and
finally things will be put together in a novel design.
1, The material in this chaptcr is mainly based on two publications fmm this work at [SCAS'% 1231 and ISCAS'98 [16].
Nd-baocd SmPrt Pbîosuwr Introduction 86
University of Windsor
Photosensors are based on a modified photoBJT in CMOS technology and act as input
nodes to a neural network classifier. A fully-connected multi-layer feedforward neural
network is chosen as it has shown superior performance over a partially-connected scheme
in classif'ying noisy patterns. Interconnection areas and problems, however created a
bottleneck in a conventional fully-connected design relying on lumped neurons and
synapses and a lumped photosensor array. This problem is greatly alleviated in the final
presented design which is based on a 2-D distributed-neuron structure and a distributed
array of 'smart pixels'. The new architecture results in a highly modular and am-efficient
VLSI implementation that has incnased synaptic density by a factor of more than two in
the same technology. The proposed smart sensor design also benefits from a robust neural
architecture due to the properties mentioned earlier in this thesis.
6.2 Objectives and Issues
On-line optical classification of objects or geometrical features is a task often encountered
in indusuial or manufacniring environrnents. Solutions to this problem range from a
rather elaborate system including CCD imager and signal processing hardware andor
software, to a single smart chip integrating photosensors and classifier processor.
Nowadays, modern sensors tend to become more and more autonomous subsystems by
self-containing some sort of signal processing. Among the best technologies for these so-
called "integrated smart sensors" (ISS)' is CMOS. which allows a dense CO-integrotion of
various sensors and signal processing circuits on a single chip.
Motivated by a manufacturing process control application, oui goal is to design a
programmable smart photosensor for on-line classification of low-resolution patterns.
The sensor is to be used in a manufacturing process control to determine the position or
classify the surface geometry of an object whose image is captund on chip 1121, [42].
In acnial operation a pattern is 'imaged' ont0 the photosensitive array using laser beam
steering or structured illumination techniques. The imaged pattern is a 2-D projection of a
-
1. An integmted smn sewor (ISS) by definition is a co-integration of one or more sensor transducers and signai processing hardware. In a neuml-hed mart sensor this pocessing k a n e u d compuwiion.
University of Windsor
3-D geometry. Based on the above noncontact measurement, the output state of the
classifier defines a control vector for the on-line process, which in turn adjusts the position
or process parameters of the object under control. A set of applied patterns representing
the tension on a string and the comsponding classes are shown in Table 6.2 on page 106.
Artificial neural networks are known as good pattern classifiers that are trainable for
different applications and offer high-speed solutions when implemented on non-
multiplexed hardware. A photosensor chip and a programmable Neural Network iC
(NNIC) would be a two-chip candidate solution to our problem. However, the number of
inputs from a 2-D photosensor chip to NNIC (e.g. 8 x 8 = 64 inputs) becornes a prohibitive
factor due to pin limitations and the complexity or delay of interface circuiiry. A srnart
photosensor chip with integrated focal-plane pattern classifier is an ideal solution here.
To implernent a smart sensor, a standard digital CMOS process is chosen since it is a
mature technology that has shown the possibility of creating low-cost customizable image
sensors as well as dense integration of neural processing circuitry. Based on the above
facts, hybrid VLSI architectures are explond for dense CO-integration of a multilayer
feedforward (MLFF) neural network classifier and a photosensor m a y in a standard
CMOS chip. Thus, two issues are to be addtessed separately, namely, ( i ) the realization
of a CMOS-compatible photosensor array, ( ii ) an efficient architecture for the VLSI
implementation of a fuliy-connected programmable neural network with a 2-D optical
inputs array.
6.3 Photosensor Array
6.3.1 CMOS-compatible Photosensitive Device
Photosensitive devices using standard CMOS, e.g. Active Pixel sensors,' are becoming
incrcasingly popular [79], [39], [54]. CCD technology, despite proven performance in
1. Active Pixel Sensors (APS) are 'Icss' smatt senson that include sensor ampiification and random-access circuitry, A smart sensor usuaily includes a higher level of signal processing.
University of Windsor
imaging applications, requires special fabrication process and suffers from intrinsic image
smear, reset noise and difficulty of random access to individual pixels. In many intelligent
imaging and sensory applications such as machine vision or neural neiworks, random or
parallel access to individual cells is a requirement [47], [94]. A CMOS-compatible
photosensitive arrays makes a good alternative in such applications. This alternative,
moreover, eliminates the requirement of special fabrication process and allows the
integration of sensor and signal processing circuitry in standard CMOS, thus making
leasible the implernentation of lowîost low-power smart sensors.
PhotoMOS and Photodiodes have little or no gain and their output reading is destructive
[46]. A PhotoBST is an attractive choice as a sensor because of its intrinsic gain and the
fact that it can be obtained as a by-product device in CMOS technology. Low bandwidth
of a photoBIT is not a problem in our application, and so is the case in many other
manufacturing environments. A higher noise in photoBJT cornpared to photodiode will be
cornpensated to some extent in a modified photoBJT that has an improved responsivity, or
in other words a higher 'signal' Ievel. Our chosen technology is a standard N-well CMOS
process.' In this technology a vertical BIT is found to be more reliable and have higher
gain (hf, = 35, based on our experiments) compared to a lateral BJT (hfi = 1) . A parasitic
photosensitive device c m be built as a vertical PNP transistor with fioating base
configuration. In this case, as shown in Figure 6.l(a), P+ difision area is the Emitter,
N-well f o m the Base, and P-substrate is the Collecter, N-well foms the area sensitive to
light as photo electron-holes are generated at the junction of N-well to P-substrate (i.e. the
base-collector junction). This pmcess constitutes base photocurrent that in turn is
amplified by a factor of (hB + 1) to create emitter output current. A vertical photoBJT
suffers from large basetollector capacitance Ck which reduces its optical responsivity
and bandwidth,
A Field-Effect-Modified (FECI) photoBJT structure, as shown in Figure 6.1(b), has been
used to improve the responsivity of the device without any additional fabrication step 1471.
1. CMOS4S: a 1 . 2 ~ double-mctal double-poly standard CMOS process h m Norihern Telecom (Nostel).
University of Windsor
A description of the original device in a bipolar process can be found in [62], while here it
is implemented in a digital CMOS technology without any additional fabrication mask or
DRC' violation. The diffusion region at the center is the emitter of a vertical PNP
transistor. The base, however, is divided by the annular P+ region into two portions,
intemal base and extemal base. These two portions dong with the annular P+ region form
an nthannel JFET in which P+ is the gate. The circular gate i s formed during the same
diffusion step in which the emitter is fonmd. When the gate is sufficiently reverse-biased,
e.g. grounded, the channel connecting the two base portions is pinched off. Therefore, the
effective capacitance Ch, is mainly reduced to that of the intemal region whereas the
primary photocurrent includes both intemal and external components. Device
responsivity is improved by increasing the ratio of extemai to interna1 base eea. In
practice, DRC niles are not to be violated and if should be also noted that contact areris of
emitter and annular P+ diffusion are not transparent to incident light.
6.33 CMOS Photoreceptor Ce11 Circuit
The circuit schematic of the CMOS photoreceptor ce11 consisting of a FEM photoBJT, a
logarithrnic 1-to-V converter PMOS, level-shifting and buffering is shown in Figure 6.2.
A photoreceptor ce11 consists of a photosensitive device to transduce incident light into
electrical current, and a logarithmic element to cornpress the dynamic range [53].
As described earlier, the photosensitive device is a FEM PNP vertical BJT formed in an N-
well CMOS process. The logarithmic element is a diode-connected PMOS load which
converts a wide range photocurrent to a small range photovoltage while operating in the
sub-threshold region to maintain loganthmic response. The photoreceptor circuit has a ce11
layout of 95 x 9oPm2 in 1.2pm CMOS. The total N-well area fonning the photobase
region is 50 x 50pm2, a fraction of which (i.e., 9 x 9pmZ) is devoted to ernitter ngion. The
bias voltage, Vs , acts as a scnsitivity or threshold control for the ce11 to generate a binary
output. Note that the type of output, digital or analog. mainly depends on the type of
buffer used at the ce11 output.
1. DRC: Design Rule Check, a set of geometrïcai mies to be adhefed to in a process technology.
University of Windsor
Figure 6.1 Top view, cross section and device equivalent mode1 OP: a) vertical photoB JT, b) Field-Effect M d i d (FEM) vertical photoB JT
Cross section (a)
N- Weil 3C1-
1 P- subsmte ( Collecter ) \ Cross section
(b)
Model
Model
University of Windsor
Figure 6.2 Photosesor ceIl circuit
In summary, an FEM bipolor transistor is realized in a standard CMOS process as a
photosensitive device. According to the descriptions given in Section 6.3.1, a crucial
modification has been made on the device presented in [47] and [46] in which the annula
PC region was connected to VDD. Another important improvement compared to [44] and
[47] lies in a layout technique applied to the circuit of photosensor cell. In this technique
N-well areas anywhere other than in photosensitive device, are covered by metal layen
(M 1 or M2) or by polysilicon, so as to minimize unwanted photo electron-hole generation
in the substrate. Sstisfactory test results and enhanced responsivity have lead to the use of
FEM BJT photoreceptor cells in our neural-bascd photosensors [23], [16].
6.4 A Review of Conventional Designs
Previous work on the design of CMOS photoreceptor arrays and neural-network-based
photosensors can be found in [12], [42], [Ml, [45]. The author has reviewed the
evolutionary steps of these designs 1231 and has ken involved in aspects of earlier works
in our group including design improvements, transitions to newer CAD environments,'
irnplementation submissions and the testing of the fabricated chips. In this section, two
1. Technology transfea and design Iiansitions h m Cadence EDGE to OPUS and later CO Ciuience9W97A.
University of Windsor
conventionai designs are chosen for review that highlight major steps in evolutionary
design of this NNIC family. A novel design presented in this thesis will be explained in
Section 6.5.
6.4.1 Partially-connected Pm-pmgmmmed Neural-based Sensor
This design contains a 10 x 10 photoreceptor array and a neural network classifier with
100 input. 16 hidden and 5 output nodes [42]. To avoid massive synaptic interconnections
and to reduce the routing problems and areas, a 'partially-connected' network has been
implemented, i.e. the input array is divided into four sub-arrays each connected to four
out of 16 hidden neurons. The sixteen neurons in hidden layer are fully connected to five
output neurons,
The network is trained off-line to recognize eight input patterns for a process control
classification tûsk. Synaptic weights are implemented (pre-prognmmed) on transistor
widihs. A fabricated CMOS chip successfully recalled eight trained pattems, when they
were projected ont0 the photosensitive array using a microscope in conjunction with other
lenses. When noisy pattems were introduced, the network was able to recognize about
94% of patterns with Hamming distance (error) of one in simulated recall and 80% in test.
Increasing the number of error bits in input noisy pattems, however quickly deteriorated
the percentage of correct recall. On the other hand, simulation study shows that a 'fully-
connected', rather than a partiallyconnected, neural classifier would correctly recall more
than 90% of noisy patterns with up to 3 error bits in this application. The cost associated
with a mon complicated design with increased die area in a fully-connected M C , would
be paid off by a superior performance under noisy input conditions. Moreover, it is well
known that fully-connected neural networks are more fault tolerant.
6.4.2 Fuiiy-coanected Programmable NeumCbased Sensor
Based on the above discussion, the VLSI realization of a neural classifier with 'fully-
connected' synaptic scheme should be our target for a robust e m t l fault-tolerant
focal-plane pattern classifier. Moreover, programmability is an attractive feature that
University of Windsor
makes the design flexible and compatible with different pattem sets in different
applications.
A fully-connected programmable neural-based photosensor chip can be found in [44] and
[43]. This 1.2prn implementation displays a classical CO-integration of a CMOS
photosensor array and a programmable neural network classifier. It contains a 5 x 5
lumped photosensor array integrated with a fully-connected multilayer feedfoward neural
network with 25 input. 4 hidden, 3 output nodes, and a conventional synapse and neuron
realization. On-chip digital weight memory is included for the programrning and storage
of synaptic weights. The dimensions of the photoreceptor array is practically lirnited to
5 x 5 because of: ( i ) interconnection problems and areas arising from the growing
complexity of synapses in a fully-connected N N K (note that the number of synaptic
interconnections roughly increases with the number of neurons squared); ( ii) a multitude
of circuit blocks, especially on-chip digital weight memory that occupies a considerable
die area.
Despite a fulltustom layout, about 608 of the core m a on this conventional neural
network based photosensor chip was occupied by metal interconnections. In practice, this
situation created a bonleneck that limited the dimensions of sensor arny as well as the
size of neural network classifier on chip.
6.5 Distributed Neural-based Sensor Architecture
In this section the design of a novel neural-based photosefisor chip developed in this thesis
is described. The terni 'distributed' in the section title refers to the facts that the presented
design relies both on a distributed-neum architecture, as well as a disrnbuted array of
smart photosensors or pixels. Figure 6.3 shows the structure of the target neural-based
smart sensor for focal-plane pattern classification.
University of Windsor
Figure 6.3 Neural-baseà photosensor for foeal-plane pattern c1assMcation
6.5.1 2-D Distributed-neumn Architecture
Combined VLSI implementation of the two main building blocks of a neural network in
the form of a unified synapse-neuron (USN) offea many advantages. As described in
previous chapters, such a realization is based on a distributed resistive neuron architecture
and brings modularity and robust neuron characteristics. Figure 6.4 illustrates a
combined realization of one neuron and N digitally-programmable synapses by using
2 parallel output connection of N = n universal building blocks on an n x n array as uscd
in the new design of smart photosensor. Each universal block consists of a sign-magnitude
synaptic weight register, a Multiplying DAC synapse with bi-directional output cumnt,
and an active nonlinear load as a sub-neuron. This is a two-dimensionally-distributed
version of the architecture presented in previous chapters.
University of Windzror
Figure 6.4 Hybrid distributeci-neuron architecture on a 2-D array
PmpmimiM Synapse 1 a Synapse 2
O
Weight Register ( ~ ~ ~ 2 )
6.5.2 Distributed Array of Smart Pixels
The new version of focal-plane pattern classifier chip uses a regular architecture of neural-
based smart pixels with distributed neurons to overcome some of the problems faced in a
conventional implementation. A modular and distributed architecture is developed at two
levels of hierarchy. Elements of the photosensor array are distributed across the core area.
Each individuai sensor element is closely integrated with al1 of its outgoing synapses,
synaptic weight storage and associated parts of distributed neurons in the hidden layer.
With this design approach (to be explained in more details), interconnection problems and
areas are greatly reduced and a larget photosensor-classifier with increased synaptic
density has been fabricated on the available die area. Moreover, unifom and robust
characteristics are achieved for the fabncated neurons regardless of the design die size.
A neural-based smart sensor with N = n2 optical input nodes, m hidden and k output
nodes is implemented in a highly modular and scalable scheme described below:
An n xn array of smart pixel modules is unifomily distributed across the silicon die area.
As shown in Figure 6.5, each smart pixel is comprised of the following elements:
University of Windwr
Fipte 6.5 Neural -bd smart pixel:
a) a schematic diagram, b) die microphotograph of two adjacent pixels
FEM-BJT l e:- t
University of W~nâsor
a) a photoreceptor ce11 as described earlier and depicted in Figure 6.l(b);
b) m programmable weight registers for the storage of synaptic weights; c) rn unified
synapse-neuron (USN) blocks that contain al1 (m) synapses form this optical input node to
hidden layer with a sub-neuron (MW of a distributed neuron in hidden layer) at the
output of each synapse; d) local clock drivers to reset and write in weight registen;
e) dc bias circuits.
Smart pixel modules an placed in such a manner that their photosensitive devices are
evenly spaced on a two-dimensional grid; a crucial requirement for photosensor array.
N-wells are covered with metal or polysilicon layer everywhere except in FEM photoBJT,
so as to minimize unwanted photo electron-hole generation. Findly, a regular ni xk array
of USN and weight register blocks forms the synapses to and the neurons in the output
layer. This anay includes k synapses and synaptic weights from each of m hidden neurons
to output neurons. It also includes k output neurons each distributed arnoiig m synapses.
Additional building blocks of similar type set the threshold level of neurons in hidden and
output layers, ir. n programmable blocks for hidden neurons and k for output neurons.
The threshold value is stored as a signed number in the (earlier named) weight register
associated with each USN block. The input to a threshold block is a fixed non-zero
voltage, in this case Vm.
A chip containing an 8 x 8 photosensitive array and a fully-connected programmable
neural network with N=64 inputs, m= 8 hidden neurons and k=4 output neurons has been
fabricated in 1.2pm CMOS. A fiwrplan of this design is shown in Figure 6.6. Total chip
area is 14.7mm2. About 90% of the chip core area is devoted to an 8 x 8 array of smart
pixels containing a unifom 2-D array of photoreceptors and other USN and weight
storage circuitry in input and hidden layers. The remainder of core ana (bottom part of
the floorplan in Figure 6.6) belongs to USN and storage units of output layer and
threshold units for both hidden and output neurons. Figure 6.7 shows the
microphotograph of the fabncated chip.
Univcnity of Windsor
6.53 Characteristics of the Neural-based Sensor Chip
Some design data and experimental characteristics about neural-based photosensor chip
are as follows.
.Optical input pattern: 8 x 8 binary
.Output class: A vector of 4 analog signals
~Chip core area: 1 1 .2mm2
.Transistor count: 60,OOO
4 k r e n t from 5-V supply: 6mA < IvDD < 27.5mA
(weight and pattem dependent)
.On-chip weight storage: 556 x 5 bits
Programming clock (2-phase non-overlap): 0.5 - 20 MHz (3.5MHz Spical)
~Weight programming cycle: 160pS (based on 3 SMHz clock)
.Throughput time (input to recalled output) - 3 . 5 ~ s
*Connections Per Second (CPS) - 160 Mega CPS
A practical issue observed in the previous neural-based sensor implementations was
related to optical input array which was found very sensitive to misaiignments and
vibrations of test tixture [41]. Moreover, optical cross-talk arnong neighboring pixels was
reported to be a problem [46]. The presented design has significantly reduced the two
rnentioned testing problems by distributing the pixels across the die area which leaves
distance among photosensitive elements in the m y . This creates some leeway for the
optical pattem shined onto the chip. Optical setup is similar to [41] which includes a light
source, lenses and a Wentworth probing station with a microscope through which input
patterns are focused onto the optical array.
6.5.4 Synaptic Density and Interco~ections
In the distributcd sensor design: a) a major part of neural network interconnections are
made locally inside smart pixel modules on short metal or polysilicon paths; b) additional
global routing is on a highly-rtgular structure in vertical and horizontal channels between
the modules; c) wasted inter-block silicon ana is very little because the= is only one type
Univctsity of Windsor
of building block at each level of hierarchy (i.e. 'srnart pixel' at the top level and USN at
the lower level). Totally 556 programmable units are on chip (544 synapses and 12
threshold units) that determine the input-output mapping performed by neural classifier.
After an off-line training session, sensor is prograrnmed by two-phase non-overlapping
clocks that ripple a sequence of 5-bit synaptic weights through the storage units.
By using the design approach explained in Section 6.5.2 and by cell-level optimization,
the number of synapses per unit die area is considerably increased compared to a
conventional version of programmable neural-based sensor [4 11. Time and effort
associated with custom interconnection has been greatly reduced as well. Table 6.1
shows a comparison between the conventional and the distributed-architecture design.
In order to establish a consistent base of comparison, a) both designs were fabricated in
the same process technology; b) synaptic density Ds is defined as the average number of
synapses per unit core ana and comparison is made based on normalized ils%; c) there
has been an attempt to determine any contributions from custom layout and ce11
optimization in order to highlight the net 'architectural' improvement.
On the basis of our experimental study summarized in Table 6.1, synaptic density is
increased by a factor of 2.7 in a neural-based photosensor with the proposed distributed
architecture. A maximum of about 60-70% improvement is associated with ceIl and
layout optimization. Therefore, at least 100% increase in synaptic density cornes from
architectural improvement due to a modular and distributed structure at two ievels of
hierarchy. In Table 6.1, the total number of synapses (including neuron threshold units)
for an n2-ni-k fblly-connected neural network is:
2 Total No. of Synapses = (n x ni) + (na x k ) + m + k
Moreover, synaptic density Ds is defined as:
D p ( n ' x m ) + ( m ~ k ) + m + k CoreA rea
University of Windsor
Figure 6.6 Floorplan of the neural-bascd photosensor chip
L J iu r Smart Pixels TI
Threshold of 8 x4 A m y for 4 o u t p u t s q 1 Hidden to Output Laver 1
Clock dinrer ' 1
and Bias -O L, Threshold of 8 Hidden Neurons 1
Figure 6.7 Micmphotognph of the neural-based photosensor chip
University of Windsor
The ratio of interconnection (routing) area to the chip core area is only 12% which marks
a significant reduction from 60.545 in the conventional design [el]. The achieved
interconnection area percentage is even better (lower) than typical CNN-based optical
array processors, noting that cellular neural networks (CNNs) are known for their 'local'
connectedness that reduces routing problems and areas. For instance, a recent 2-D
programmable mixed-signal focal-plane array processot based on the CNN paradigm,
uses 35% of the total chip area for routing [27].
Table 6.1. Cornparison between a convenüonai and a di~tnbuted neuraî-based photosensor design
1 Optical A m y 1 5x5 1 8x8
1 Neural Network 1 25-4-3 1 64-84
1 Total No. of Synapses 1 I Die Area ( 3.45x3.05 = 10.5 mm2 ( 4.0~4.03 = 16.1 mm2
Core Area
1 Active Cell Area % 1 29.6% 1 85% 1 Interconnection Area 46 1 60.5% 1 12%
.- - - - -- - -
( Unused Silicon Area 1 - 10% 1 - 3%
6.5.5 Robushiess of Neuroas
A robust neuron characteristic is achieved for two reasons:
Normdized D, 1 1
1) Averaging effect: Lumped analog neurons implemented across a sizable die are subject
to major characteristic variations. In the distributed sensor chip, each neuron in the hidden
2.7
layer consists of n2 =64 distributed elements located on a 2-0 array, as each element is
I
a. This number also inchdes neurons' threshold (biiis) units that sire very sirnilai. to synaptic units.
Neural-bwd S m Photo- * . Duvrbutcd Ned-bPstd Sensor Adictcturr 102
University of Windsor
contained in a smart pixel. A hidden neuron, therefore, takes an average of various
characteristics over the die surface. This effect makes al1 hidden neurons vimially
uniform. For output neurons the averaging effect takes place over a 1-D m y .
A measurement study showed that 'Iurnped' neurons 2500pm apart on silicon had a
maximum of 1.6% variation, while 'distributed' neurons contained in the same area
exhibited under 0.5% variation. The results were worst case among five fabrications
(more details can be found in Chapter 3). This property is especially important in a sizable
chip such as the present sensor which can be subject to large on-chip variations.
2) Fadt tolerance: Due to the fact that a neuron circuit is distributed among many sub-
blocks, there is a potential fault tolerance for neurons. A VLSI defect, e.g. an open circuit,
may only affect a fraction of a neuron instead of the whole.
6.5.6 BiCMOS vs. CMOS Implementation
As part of a sensor optimization study, the 8 x 8 distributed neural-based sensor with a
64-8-4 neural network has been implemented with simila. circuit blocks in a submicron
BiCMOS technology [51]. The chosen BiCMOS process potentially offers three
implementation advantages in our application: a) a smaller feature size (0.8pm vs. 1.2pm
CMOS); b) three metal layers vs. two layers in 1.2pm CMOS; c) true bipolar transistors
rather than parasitic BJTs of CMOS. The two first properties resulted in a denser sensor
realization in o smaller die area (10.6mm2 instead of 16. lmm2 in CMOS), while the third
property was used to implement photoreceptor cells based on true bipolar Darlington
transistors.
On the other hand, BiCMOS is a mon expensive process by nature, especially because of
the extra masks required for bipolar devices. The fabrication cost per area in 0.8pm
BiCMOS was three times that of in 1 . 2 ~ CMOS.' In practice, fabrication cost increase
1. Evaiuatibn is basal on f i cat ion cost of $800/mm2 for 0 . 8 ~ BiCMOS and $264/mm2 fot 1 . 2 ~ CMOS in 1997 (prices in Canadian Dollat).
University of WindsPr
was a offset by the reduction in sensor die area and the BiCMOS sensor turned out to be
twice more expensive han its CMOS counterpart. Submicron geometries and the
multiplicity of metal interconnection layers are, noneîheless, attractive features for dense
NNIC implementations that should be sought in advanced CMOS technologies.
Su brnicron CMOS technologies are in fact considerabl y less ex pensive than a similar
feature size BiCMOS. With the dirninishing trend in BiCMOS technology. an advanced
submicron CMOS process will be the naturai choice for future implementation of a
neural-based photosensor.
A design study in 0.35p.m or 0.25pm CMOS is proposed for future work. With an
optimum NNIC design and custom layout, neural-based photosensor chips with input
arrays as large as 12 x 12 to 16 x 16 are seen to be feasible in these technologies.
A training and recall simulator with multi-window and rnulti-tasking graphical user
interface (GUI) under XWew is developed especiaily for USN architecture.' Figure 6.8
shows some of the windows available in this simulator. From the main window (Figure
6.8(a)) the user can choose to define the network structure (Figure 6.8(c)), graphically
define the desired input-output patterns (Figure 6.8(d)). and based on hisher defined
parameters run a training session (Figure 6.8(e)) and finally a simulated recall (Figure
6.8(9). A modified back-propagation (BP) algorithm is used in which the property of a
distributed-neuron architecture is embedded, i.e. the saturating function of each neuron is
scaled, both in training and in recall phase, proportional to the number of its inputs. The
resulting weight set is rounded off to 5 (one sign and 4 magnitude) bits to match the
resolution of the hardware. A simulated recall has been included to ensure the
functionality of the network with quantized weights.
1. A simulator for standard BP was developed in our group [43]. Both the graphical interface and the incorporated algorithms are modified here for the special hardware architecture described in Section 65.
University of Windsor
Figure 6.8 Various popup windows in hininglRecail simulator:
a) main window, b) about featwes, c) define structure (graphie window not shown),
d) defim inputloutput patterns, e) define and run training, f) simulated r-11
University of Windsor
Table 6.2 shows a set of 9 inputioutput patterns used for the training of a neural-based
sensor with 8 x 8 optical inputs. The final outcome of traininglrecall simulator is a set of
weights to be prograrnmed on sensor chip. Weight programming is perfomed by a two-
phase non-overiapping clock. In practice, dock lines (@,, & und and the sequence
of weight vectors were generated by HP 75ûûû-D2O VXI-bus system, a software-
controlled digital test system also known as 020 Ester. An alternative test setup included
HP81 80 Data Generator and HP8182 Data ~ n a f ~ z e r . '
Table 6.2. 'kainhg pattem set for the 8 x 8 smart photosensor
1 Pattern 3 1 1 Pattern 9 1 Pattrrn 6
I
: Light ON
z : Light OFF
Pattern 2 1 Patbrn 5
6.7 Conclusion
Pattern 8
The design of a neural-based smart photosensor with focal-plane pattem classification for
an on-line process control is described. Photosensors are based on Field-Effect Modified
parasitic photoBST in a CMOS technology. Elernents of photosensor anay are distributed
over die surface and a neural-based smart pixels is fonmd around each sensor cell.
On-sensor neural classifier is based on a programmable hybrid architecture with unified
synapse-neurons that rely on distnbuteû neurons. nie proposed architecture has greatly
I
1. A CAD demanstraâon on this design was presentcd at TEXPOP% [26].
University of Windsor
reduced interconnection areas and increased the synaptic density. Thus, the size of the
optical input array and the neural network classifier integrated in the available die area has
been increased. In addition, uniform and robust neuron characteristics are realized despite
fabrication process variations over the surface of a sizable die. Judged by the great
modularity and uniformity of its fabricated elements, the proposed architecture is a good
candidate for Wofer Scale Integration (WSI) of neural networks and neural-network-based
srnart sensors.
Chapter 7 Conclusions
7.1 Summary
In this thesis after studying various methods for the implementation
of neunl networks. it was decided that a hybnd analog-digital
approach for a fully-parallel VLSI implementation should be
explored. A robust hybrid architecture was developed based on
unified synapse-neurons (USN) that implements a fully-connected
multilayer neural network with regular arrays of a universai
building block. The universal block was a digitally programmable
USN comprised of an MDAC, a sigrnoidal sub-neuron and a
built-in weight register. Circuit design, implementation and
characterization were performed in a standard CMOS process both
for 5-V and 3 . 3 4 supply voltages.
The salient features of the proposed VLSI architecture are: high
modularity, a fully-parailel single-chip implementation, silicon area
efficiency due to reduced interconnection and inter-block areas,
self-scaling property of sigrnoidal neurons, quantization noise
improvement, uniforrn neuron hinctions due to an averaging effect.
an increased fault tolerance, automatic fan-out increase of USN
blocks, and digital prograrnmabüity. A special optoelectronic
version of the architecture relying on a 2 4 distributed array of
University of Windsor
neural-based smart pixels was presented for the implementation of a photosensor with
focal-plane pattern classifier. Photosensitive elements were based on a Field-Effect
Modified vertical photoBlT in a standard CMOS technology.
Four chips were fabricated and tested during the course of this project: a chip containing
USN blocks and neural network test circuits (Chapter 3), an optical ielectronic template
matching network (Chapter 4), a 16-4-3 general purpose vector classifier NNIC (Chapter
4) and a programmable srnari photosensor integrating an 8 x 8 photosensor m y and a
64-8-4 neural network classifier (Chapter 6).
7.2 Contributions
A robust smart non-contact optical sensor based on a VLSl implementation of neural
network with an integrated photosensitive array and programmable digital weights has
been designed, realized and programmed. Optical pattems on an 8 x 8 array are mapped to
form process control vectors based on four analog neuron outputs by training the network.
The sensor was designed to detect low resolution fringe patterns resulting from
illuminating an object with coherent light. In this manner a small number of pixels can be
used to generate spatial precision control information based on diffraction pattems for use
in a flexible manufacturing cell.
In line with the achievement of the above objective, several contributions made in this
thesis ranging from the architecture to novel circuits, to the properties explored
theoretically and experimentally, can be summarized as follows:
A hybrid distnbuted-neuron architecture was presented and proved with fully
functional ICs. A new hybrid alternative to a fully-analog approach in [Ml, the
presented architecture features new properties and circuit naiizations. The architecture
was described in Chapter 2 and in [21].
University of Windsor
Quantization noise improvement is an emerging property exclusive to a 'hybnd'
distributed-neuron implementation consisting of digital weights and analog neurons.
The reduction in output noise (weight quantization error) to signal ratio is a
consequence of self-scaling property. The first stochastic model for a distributed
neuron was presented in this thesis (see Chapter 5 and [20]). The stochastic model is an
extension of a model by Piché [68], 1691 for a conventional (lumped) neuron. The tirst
conclusion of Piché [69] is that increasing the number of nodes per layer in a
(conventionai) Maddine increases the required weight accuracy assuming a minimum
acceptable signal-to-noise ratio (SNR). In this thesis, it was shown that a distributed-
neuron architecture is advantageous in tenns of mûintaining a better signal-to-noise
ratio as the number of nodes per layer (or neuron inputs) increases. The larger the
network becomes, the more apparent the SNR advantage is, cornpared to a conventional
Maddine network.
An interesting self-scaling property of a distributed neuron was described intuitively
and demonstrated analyticdly and experimentdly (see Chapter 3 and Chapter 5).
Besides contributing to improving quantization effects, this property circumvents a
neuron re-design in the implementation of networks with various sizes.
The averaging propeny of distributed neurons against process variations, especidly an
infamous threshold voltage mismatch in MOS transistors, was analyzed and
demonstrated with fabrication measurements (see Chapter 3). This property that
creates virtually uniform neuron characteristics i s an important contribution in
addressing the problem of analog circuit variations, especially in large networks.
Simultaneously improving quantization noise effects and averaging out analog
variations, the presented architecture proves to be capable of reducing two main types
of implementation erron in digital and in analog domains, respcctively [19].
The presented neural network architecture nsults in a fully-parallel Nlyconnected
singleship implementation, as opposed to multiplex architectures [63],[96], partially-
connected schernes 1421, or chip-set solutions [48].
Conclusioii~ Caatribitio~ 110
University of Windsor
A novel and compact nsistive-type neuron circuit based on quacûatic operation of
NMOS and PMOS transistors was presented (see Chapter 3 and [15]). Even for a
lumped implementation, circuit analyses, simulations and measurements al1 indicated
interestingly low characteristic variations for the neuron circuit compared to those in
amplifier-type neurons (e.g. [71], [35]). The maximum variation in 10 chips was 2.245
while the worst-case variation within one chip was 1.3%. Moreover, a distributed
implementation of the proposed neuron circuit revealed a maximum measured variation
of oniy 0.5%.
A programmable nonlinearly-loaded Multiplying DAC was presented as a new
universal circuit block for the implementation of NNICs (see Chapter 4,1241 and [18]).
As for the MDAC, al1 three sub-blocks were modified compared to [57], and hence the
experirnental characteristics were improved. Circuit techniques especially increased
the dynamic range of synaptic function and made possible an operation on 3.3V. as well
as SV supply. The low-voltage operation on 3.3V reduced the power consumption by
86%, compared to standard SV operation.
A novel design for a CMOS-compatible smart photosensor with focal-plane pattern
classification was presented in Chapter 6 and in [16]. A programmable neural-based
smart pixel with distributed neurons was the building block of the sensor m y .
Incremental improvements were made as explained in Section 6.3.2 and Section 6.5.3
on a Field-Effect Modified photoBJT and on a photosensor array built with this device
in a standard CMOS technology.
An important improvement demonstrated in the context of the neural-based
photosensor was the great reduction in interconnection areas and a corresponding
increase in 'synaptic density'. In practice, the proposed architecture slashed the area
for routing from 6û% to 124, and increased the synaptic density by a factor of 2.7
compared to a custornized conventional implementation in the same technology (see
Section 6.5.4). As a result, a larger photosensor acray and a larger neural network
classifier were implemented on a restricted silicon die arca.
University of Windsor
7.3 Suggested Future Research
Fault tolerance is an issue that can be hirther investigated. An improved fault tolerance
in a distributed-neuron architecture compared to a conventional neural network
implementation was discussed and intuitively undeatood in Chapter 3. Moreover, in
Chapter 1 it was argued that a time-multiplexed neural network architecture (e.g. 1631,
[96]) had a lower degree of hardware redundancy and fault tolerance. Future research
can further explore the three architectures, namel y a time-multiplexed, a conventional
and a distnbuted-neuron architecture, in order to provide measures of their reliability in
various faulty conditions. Different types of faulty conditions. e.g. VLSI defects (open
/short), or burst noise at different quantities can be introduced to tnined neural network
classifiers in order to compare their recall performances.
Quantization noise improvement in a hybrid distributed-ncuron architecture implies the
possibility of a reduction in weight precision of recdl hardware. As the number of
neuron inputs (or nodes per layer) increases, there is a relative gain in signal-to-noise
ratio (Sm) of a distributed-neuron network compared to a conventional one (see
Section 5.4 and Section 5.5). When a minimum SNR level is set as the criteria, then at
some point the relative gain can be traded off with a lower number of bits in weight
quantization. Estimates show that depending on network topology each 5- 10 dB
difference in SNR is equivalent to 1 bit difference in weight precision. For instance,
conditions could be found under which a hybrid distributed implementation with &bit
digitized weights perfoms just as satisfactory as a conventional implementation with
5-bit weights. If the number of bits in our 5-bit programmable universal building block
can be reduced to 4, the MDAC circuit will be nearly halved in area and the size of the
weight register will be made smaller by 20%. This situation will nsult in a denser
synaptic implementation with a lower power consumption.
The most demanding implementation in this thesis was that of the neural-based
photosensor chip. In order to maintain a consistent base of cornparison with an earlier
conventional implcmentation of this sensor and to highlight the architectural (not
technological) improvements only, it was decided to make the new distributed
implementation in the same ( 1 . 2 ~ CMOS) technology as for the conventional one
University of Windsor
before. Later a BiCMOS version of the sensor was implemented based on the same
distnbuted architecture and circuits to demonstrate the technology-related
improvements [SI]. The 0.8pm BiCMOS implementation was denser but tumed out to
be twice more expensive (see a discussion in Section 6.5.6).
As a result, an implementation of the presented sensor architecture in an advanced
submicron CMOS process is suggested for future work. Submicron feature sizes and
the multiplicity of metal interconnect layers are attractive properties of advanced
CMOS processes for dense neural network implementations. Nonetheless, a submicron
CMOS process is considerably less expensive than a similar feature size BiCMOS
process.
Currently, a triple-metal 0 . 3 5 ~ CMOS process is readily accessible from TSMC' and
the availability of a 5-metai 0.25pm CMOS process from the same foundry is
imminent. Initial studies indicate that (13 photosensor elements based on vertical BJTs
and its modifications are realizable in N-well0.35pm CMOS process2, and (ii) with a
fullsustom layout, a programmable neural-based photosensor chip with an integrated
photosensor array as large as 12 x 12 in 0.35pm and 16 x 16 in 0.25pm, dong with a
corresponding size multilayer neural network classifier should be feasible.
A neural-based photosensor implementation in 0.25pm CMOS process is highly
recommended, as the availability of five metal layen for interconnections in this
technology would be an asset for a fuilyconnected neural network integrated circuit.
In addition, a 2.SV supply operation in this process can be utilized towards a low-power
neural network irnplementation.
1 . Taiwan Semiconductor Manu facturing Company (fabrication services provided through the Canadian Microelecttonics Corporation - CMC).
2. A photoBJT cet1 bas been testeâ in 0 . 3 5 ~ CUOS* The results can not k reportcd hem due to non- disclosure agreements.
Conclusions SugOcstcd Fume Rcsauch 113
References
P.E. Allen and D.R. Holberg, CMOS Analog Circuit Design. New York: Holt Rinehart and Winston, 1987.
A. Aslarn-Siddiqi, W. Brockherde and B.J. Hosticka, "A 16 x 16 Nonvolatile Programmable Andog Vector-Matrix Multiplier," IEEE Journal of Solid-State Circuits, Vol. 33, No. 10, pp. 1502- 1509, October 1998.
L.E. Atlas and Y. Susuki, "Digital Systems for Artificial Neural Networks," IEEE Circuits and Devices Magazine, Vol. 5, No. 6, pp. 20-24, September 1989.
G. Bloch, F. Sirou, V. Eustache and P. Fatrez, "Neural Intelligent Control for a Steel Plant," lEEE Transactions on Neural Networks, Vol. 8, No. 4, pp. 910-918. July 1997.
B.E. Boser, E. Sackinger, S. Bromley, Y. LeCunn, RE. Howard and L.D. Jackel, "An Analog Neural Network Processor and Its Application to High-speed Character Recognition," Proceedings of International Joint Conference on Neural Networks (IJCNN), Vol. 1, pp. 4 15-420, Seattle, July 199 1.
J. Cao, M. Shridhar, M. Ahmadi and G.A. Jullien, "Recognition of Handwritten Numerals with Multiple Feature and Multistage Classifier," Journal of Pattern Recognition, Vol. 28, No. 2, pp. 153-163, 1995.
I. Cao, M. Shridhar, M. Ahmadi and GA. Jullien, "VLSI Implementation for Real- Time Extraction of Direction Vectors €rom Binary Images," Proceedings of 36th Midwest Symposium on Circuits and Systems, Vol. 2, pp. 963-966, Detroit, MI, August 1993.
G.A. Carpenier and S. Grossùerg, "Neural Networks: Introduction to the 1 December 1987 Issue of Applied Opiics," Special Issue of Applied Optics, Vol. 26, pp. 4909, 1987.
University of Windsor
A. Chandna, G.A. Jullien and W.C. Miller, "Opto-Programmable Neural Networks: An initial Study," Proceedings of Canadian Conference on VLSI (CCVLSI), pp. 41- 48, Vancouver, BC, October 1989.
C.P. Chew, R.W. Newcomb and J.D. Yuh, "VLSI Circuits for Optoelectronic Neural Network Weight Setting," Proceedings of 36th Midwest Symposium on Circuits and S ystems, pp. 75 1-754. Detroit, MI, Aupst 1993.
L.I. Davis, Jr., GY. Puskorius, F. Yuan and L.A. Feldkamp, "Neural Network Modeling and Control of an Anti-lock Brake System," Proceedings of Intelligent Vehicle'92 Symposium, pp. 179-184, Detroit, MI. 1992.
L. Del Pup, N. Bewtra, R. Grondin, G.A. Jullien and W.C. Miller. "An Optically Coupled Neural Network for Process Control," Proceedings of Canadian Conference on VLSI (CCVLSI), pp. 4.2.1- 4.2.7, Ottawa, ON, October 1990.
J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. lackel and J. Hopfield, "Large Automatic Leaming, Rule Extraction. and Generaiization," Complex Systems, Vol. 1. pp. 877-922, 1987.
B.K. Dolenko and H.C. Card, 'Tolcrance to Analog Hardware of On-Chip Leaming in Backpropagation Networks," IEEE Transactions on Neural Networks, Vol. 6, No. 5, pp. KM- W 2 , September 1995.
H. Djahanshahi, M. Ahmadi, G.A. Jullien and W.C. Miller. "A Low-Vanation Nonlinear Neuron Circuit," (accepted in) Journal of Circuits, Systems and Cornputers, 1999.
H. Djahanshahi, M. Ahmadi, G.A. Jullien and W.C. Miller, "A Robust Hybnd Neural Architecture for An industrial Sensor Application," Proceedings of EEE International Symposium on Circuits and Systems (ISCAS), Vol. DI, pp. 41-45, Monterey, CA, May 3 1 -Sune 3. 1998.
H. Djahanshahi, M. Ahmadi, G.A. Jullien and W.C. Miller, "Neural Network Integrated Circuits with Single-block Mixed-signal Anays," Proceedings of 3 1st Asilomar Confennce on Signais, Systems & Computers, Vol. 2, pp. 1130-1 135, Pacific Grove, CA, November 1997.
H. Djahanshahi, M. Ahmadi, G.A. Jullien and W.C. Miller, "Neural Network lntegratcd Circuits with Single-block Mixed-signal Arrays," (submitted CO special issue of) Journal of Circuits, Systems and Computers, June 1999.
H. Djahanshahi, M. Ahmadi, G.A. Jullien and W.C. Miller, "A Self-scaling Neurai Hardware Structure That Reduces the Effect of Some impkmentation Errors:' Neural Networks for Signal Rocessing W - Roceedings of the 1997 IEEE Workshop (NNSP'97), pp. 588-597, Amelia Island, Florida, September 1997.
Univcisity of Windsor
H. Djahanshahi, M. Ahmadi, G.A. Juilien and W.C. Miller, "Quantization Noise Improvement in a Distributed-neuron Architecture," Proceedings of 40th Midwest Symposium on Circuits and Systems, Vol. 2, pp. 1282-1285, Sacramento, CA, August 1997.
H. Djahanshahi, M. Ahmadi, G.A. Jullien and W.C. Miller, "A Modular Architecture for Hybrid VLSI Neural Networks and its Application in a Smart Photosensor," Proceedings of IEEE International Conference on Neural Networks (ICNN), Vol. 2, pp. 868-873, Washington, D.C., June 1996.
H. Djahanshahi, M. Ahmadi. G.A. Iullien and W.C. Miller, "A Unified Synapse- Neuron Building Block for Hybrid VLSI Neural Networks," Proceedings of iEEE International Symposium on Circuits and Systems (ISCAS), Vol. 3. pp. 483-486, Atlanta, GA, May 1996.
H. Djahanshahi, G.A. lullien, W.C. Miller and M. Ahmadi, "Neural-based Smart CMOS Sensors for On-Line Pattern Classification Applications," (invited). Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Vol. 4, pp. 384-387, Atlanta, GA, May 1996.
H. Djahanshahi, M. Ahmadi, G A . Jullien and W.C. Miller, "Design and VLSI lmplementation of a Unified Synapse-Neuron Architecture," Proceedings of Sixth Great Lakes Symposium on VLSI (GLSVLSI), pp. 228-233, Iowa State University, Ames, Iowa, March 1996.
Hormoz Djahanshahi and Bart MacLean, "Mixed-Signal VLSI Neural Networks with Self-Scaling Neurons," Hardware Demonstration at TEXP0'97, Symposium on Microelectronics Research & Development in Canada (MR&DCAN), Ottawa, ON, June 1997.
Hormoz Djahanshahi, "Neural-based Smart Photosenson," CAD Demonstration at TEXP0'96. Symposium on Microelectronics Research & Development in Canada (MRBrDCAN), Ottawa, ON, kne 1996.
R. Dominguez-Castro, S. Espejo, A. Rodriguez-Vhquez, R. A. Carmona, P. Foldesy, A. Zarandy, P. Szolgay, T. Sziriinyi and T. Roska, "A 0.8-pm CMOS Two- Dimensional Programmable MixedSignal Focal-Plane A m y Processor with On- Chip Binary Imaging and Instruction Siorage," IEEE Journal of Solid-State Circuits, Vol. 32, NO. 7, pp. 1013-1025, July 1997.
J.R. Dorronsoro, F. Ginel, C. Sanchez and C. Santa Cruz, "Neural Fraud Detection in Credit Card ûperations," lEEE Transactions on Neural Networks, Vol. 8, No. 4, pp. 827-834, July 1997.
E.I. El-Masry, H.K. Yang and M.A. Ykout, "hplementations of Artificial Neural Networks Using Cunent-Mode pulse Width Modulation Technique," IEEE Transactions on Neural Networks, Vol. 8, No. 3, pp. 532-548, May 1997.
University of Windsor
S. M. Fakhraie and K. C. Smith,Vï.SI-compati Implementations for Artijicial Neural Networks, Boston: Kluwer Acadernic hrblisher, 1997.
N. Farat, "Optoelectronic Neural Networks and Learning Machines," EEE Circuits and Devices Magazine, Vol. 5, No. 5, pp. 32-4 1, September 1989.
H.P. Graf et ai., "VLSI Irnplementation of Neural Network Memory with Several Hundreds of Neurons," N P Conference Proceedings, Snowbird, Utah. J.S. Denker, Ed., American lnstitute of Physics, New York, NY, pp. 182- 187, 1986.
H.P. Graf and LD. Jackel. "Analog Electronic Neurai Network Circuits," EEE Circuits and Devices Magazine, Vol. 5, No. 4, pp. 44-49, July 1989.
D. Hammeatorm, "A VLSI Architecture for high-performance Low-cost On-chip Leming," Proceedings of International Joint Conference on Neural Networks (UCNN), Vol. JJ, pp. 537-544, San Diego, CA, June 1990.
J.A. Hegt, "Hardware Implementations of Neural Networks," Proceedings of Measurement and Artificial Neural Networks, 'Themadag van de Werkgemeenschap Meten', Utrecht, November 1993.
W.D. Hills, n e Connection Machine, MIT Press, Cambridge, MA, 1985.
M. Holler, S. Tarn, H. Castro, R. Benson, "An Electrically Trainable Analog Neural Network (ETANN) with 10240 'Floating Gate' Synapses," Proceedings of International Joint Conference on Neural Networks (UCNN), pp. 19 1 - 196, Washington, D.C., June 1989.
J.J. Hopfield, "Neural networks and pliysical systems with emerging collective computational abilities," Proceeding of the National acaderny of Sciences, Vol. 79, pp. 2554-2558, 1982.
Y. Iida, E. Oba, K. Mabuchi, N. Nakamura and H. Miura, "A 1/4-Inch 330k Square Pixel Progressive Scan CMOS Active Pixel Image Sensor," IEEE Journal of Solid- State Circuits, pp 2042-2047, Vol. 32, No. 12, December 1997.
F.I. Kub, K.K. Moon, I.A. Mack and F.M. Long, "Programmable Analog Vector- Matrix Multipliers," IEEE Journal of Solid-State Circuits, Vol. 25. pp. 207-214, February 1990.
B. Lam, Design und Training of a Pmgrammable Fault-tulerunt Neural Network, M.A.Sc. Thesis. Universitv of Windsor. Canada. 1995.
University of Windsor
1421 B. Lam, W.C. Miller and G.A. Jullien, "An Intelligent Optical Sensor," Proceedings of International Conference on Applications of Photonic Technology, Sensing, Signal Processing and Communications (ICAPT), Toronto, Canada, June 1994, in Applications of Photonic Technology, Ed., G. A. Lampropoulos, J. Chrostowski, R. M. Measures, Plenum Press, New York and London, pp. 14 1 - 144, 1995.
[43] K.W. Lei, "A 1 . 2 ~ Neural Network Design," M.A.Sc. Thesis, University of Windsor, Canada, 1994.
1441 K.W. Lei. G.A. Jullien, W.C. Miller, "A Programmable Intelligent Opticd Sensor Realization," Proceedings of 37th Midwest Symposium on Circuits and Systerns. Vol. 1, pp. 465-468, Lafayette, LA, August 1994.
[45] K.W. Lei, G.A. Jullien and W.C. Miller, "An Intelligent Opticd Sensor Realization," Proceedings of 36th Midwest Symposium on Circuits and Systems, Vol. 2, pp. 1284- 1287, Detroit, MI, 1993.
[46] G. Liang, CMOS Opto-Electronics Implementation and Application, M.A.Sc. Thesis, University of Windsor, Canada, 1993.
[47] G. Liang and W.C. Miller, "A Novel Photo BIT Array for Intelligent Imaging," Proceedings of 36th Midwest Symposium on Circuits and Systems, pp. 1056-1059, Detroit. MI, August 1993.
[48] B. Linares-Barranco, E. Shchez-Sinencio, A. Rodriguez-Vhquez and J.L. Huertas, "A Modular T-Mode Design Approach for Analog Neural Network Hardware Implementations," IEEE Journal of Solid-State Circuits, Vol. 27, No. 5, pp. 701-7 12, May 1992.
1491 R.P. Lippmann, "An Introduction to Computing with Neural Nets," IEEE ASSP Magazine, Vol. 27, No. 1 1, pp. 4-22, April 1987.
[SOI R.P. Lippmann, "Pattern Classification Using Neural Networks," EEE Communications Magazine, Vol. 27, No. 1 I , pp. 47-64, November 1989.
1511 B. MacLean, H. Djahanshahi, M. Ahmadi, G.A. Jullien and W.C. Miller, "A BiCMOS VLSI Implementation of an Intelligent Sensor," Roceedings of 40th Midwest Symposium on Circuits and Systems, Vol. 2, pp 1065-1068, Sacramento, CA, August 1997.
1521 N. Manduit, M. Duranton, J. Gobert, J.A. Sirat, "Lneum 1.0: A Piece of Hardware LEGO for Building Neural Network Systems," IEEE Transactions on Neural Networks, Vol. 3, pp. 414-421, May 1992.
1531 C.A. Mead, Analog VLSI and Neural Systems, MA: Addison-Wesley, 1989.
University of Windsor
1541 S. Mendis, S. E. Kemeny and E. Fossum, "CMOS Active Pixel Image Sensor," IEEE Transactions on Electron Devices, Vol. 41, No. 3, pp. 452-453, 1994.
[55] Meta-Software, HSPICE User's Manual: Elements and Device Models (Volume II), Version 96.1 for HSPICE Release 96.1. February 1996.
[56] G. Moon, M.E. Zaghloul and R.W. Newcomb, VLSI Implementation of Synaptic Weighting and Summing in Pulse Coded Neural-vp Cells," IEEE Transactions on Neural Networks, Vol. 3, No. 3, pp. 394-403, May 1992.
1571 A. Moopen, T. Duong and A.P. Takoor, "Digitd-Analog Hybrid Synapse Chips for Electronic Neural Networks," in Advances in Neural Information Processing Systems, Vol. 2, pp. 769-776, 1990.
[58] J.M. Moreno. F. Castillo, J. Cabestany, J. Madrenas and A. Napieralski. "An Analog Systolic Neural Processing Architecture," IEEE Micro Magazine, pp. 51-59, June 1994.
[59] A.F. Murray and A.V.W. Smith, bbAsynchronous Arithmetic for VLSl Neural Systems," Electronics Letters. Vol. 23, No. 12, pp. 642-643, June 1987.
[60] A.F. Murray, D. Del Corso and L. Tarassenko, "Pulse-strem VLSI Neural Networks Mixing Analog and Digital Techniques," IEEE Transactions on Neural Networks, Vol. 2, pp. 193-203, Much 199 1.
[61] A.F. Murray and L. Tarassenko, Analog Neural VLSI: Pulse Stream Appmach, Chapman and Hall, London, U.K. 1994.
[62] R.A. Nordstrom, I.D. Meindl, "The Field-Effect Modified Transistor: A High- Responsivity Phototransistor," IEEE Transactions on Electron Devices, Vo1.s~-78, No. 5, pp. 41 1-4 16, October 1972.
[63] A. Nosratinia, M. Ahmadi, M. Shridhar and G.A. lullien, "A Hybrid Architecture for Feed-forward Multi-layer Neural Networks," Roceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Vol. 3, pp. 1541-1544, San Diego, CA, May 1992.
[64] A. Nosratinia, N. Yazdi, M. Ahmadi and M. Shridhar, "A Family of Hybrid Neural Networks," Roceedings of Midwest Symposium on Circuits and Systems, Vol. 1, pp. 469-472, Lafayette, Louisiana, Aupst 1994.
[65] T. Ong, P.K. Ko and C. Hu, "The EEPROM as an Analog Memory Device," IEEE Transactions on Electron Devices, Vol. 36, pp. t84Wl841, September 1989.
[66] I.M.C. Oosse, H.C.A.M. Withagen, I.A. Hegt, "Analog VLSI implementation of a fecd-fornard neural network," Roceedings of IEEE International Conference on Electronics. Circuits and Systems (ICECS), Cairo, Egypt, December 1994.
University of Windsor
1671 M.L. Padgett, O. Erten, F.M. Salam, "Neural Networks and Computing: Practical Applications," Proceedings of IEEE International Conference on Neural Networks (ICNN), Plenary, Panel and Special Sessions, pp. 23-27, Washington, D.C., June 1996.
[68] S. W. Piché, 'The Selection of Weight Accuracies for Madalines," IEEE Transactions on Neural Networks. Vol. 6, No. 2, March 1995, pp. 432-445.
[69] Stephen Piché, Selection of Weight Accurucies for Neural Networkî, Ph.D dissertation, Stanford University, 1992.
[70] GY. hiskorius, L.A. Feldkamp and L.I. Davis, Jr., "Dynamk Neural Network Methods Applied to On-Vehicle Idle Speed Control," Proceedings of IEEE International Conference on Neural Networks (ICNN), Plenary, Panel and Special Sessions, pp. 238-243, Washington, D.C., kne 1996.
[7 11 R.D. Reed and R.L. Geiger, "A Multiple-Input OTA Circuit for Neural Networks," IEEE Transactions on Circuits and Systems, Vol. 36, No. 5. pp. 767-769, May 1989.
[72] E.A. Rietman, R.C. Frye, C.C. Wong and C.D. Komfeld, "Amorphous Silicon Photoconductive Arrays for Artificial Neural Networks," Applied Optics, Vol. 28, No. 15, pp. 3474-3478, August 1989.
[73] N. Rochester et al., "Tests on a ce11 assembly theory of the action of the brain, using a large digital cornputer," [RE Transaction on Information Theory, IT-2, pp. 80-93, 1956.
[74] F. Rosenblatt, "The perceptron: A Prob~bilistic Mode1 for Information Stonge Organization in the brain," Psych. Rev. 65, pp. 386-408, 1958.
[75] V. Ruiz de Angulo and C. Torras, "Self-Calibration of a Space Robot," EEE Transactions on Neural Networks, Vol. 8, No. 4, pp. 951-963, July L997.
[76] E. Sackinger, B.E. Boser, J. Bromley, Y. LeCun and L.D. Jackel, "Application of the ANNA Neural Network Chip to High-Speed Character Recognition," IEEE Transactions on Neural Networks, Vol. 3, No. 3, pp. 498-505, May 1992.
1771 F* M. Salam, "A Neuro-Chip for Real-time Leaming, Processing and Control," Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Vol. m, pp. 54-57, Monterey, CA, May 3 1-June 3, 1998.
[78) F. M. Salam, H. J. Oh, "Real-time Tracking Control using Modular Neural Chips with Onchip Leaming," Proceedings of IEEE International Conference on Neural Networks (ICNN), Vol. 2, pp. 9 14-9 19, Washington, D.C., June 1996.
University of Windsor
R.W. Sandage and J.A. Connelly, "Producing Photo-transistors in a Standard Digital CMOS Technology," Proceedings of International Symposium on Circuits and Systems (ISCAS), pp. 369-372, May 1996.
S. Satyanarayana, Analog VLSI Implementation of Reconfigurable Neural Networks, P h 9 dissertation, Columbia University, 199 1.
S. Satyanarayana, Y. Tsividis and H.P. Graf, "A Reconfigurable VLSI Neural Network," IEEE Journal of Solid-State Circuits, Vol. 27, No. 1, pp. 67-8 1, January 1992.
C.-K. Sin, A. Kramer, V. Hu, R.R. Chu and P.K. Ko, "EEPROM as an Analog Storage Device, with Particular Applications in Neurai Networks," EEE Transactions on Electron Devices, Vol. 39, pp. 14 10- 14 19, June 1992.
M.A. Sivilotti, M.R. Emwerling and C.A. Mead, "VLSI Architectures for hplementation of Neural Networks," AIP Conference Proceedings, Snowbird, Utah, J.S. Denker, Ed., American Institute of Physics, New York, NY, pp. 408-4 13, 1986.
R.G. Steams, 'Trainable Optically-programmed Neural Network," Applied Optics, Vol, 3 1, No. 29, pp. 6230-6239, October 1992.
A.P. Takoor, A. Moopen, H. Langenbacher and S.K. Khana, "Programmable Synaptic Chip for Electronic Neurd Networks," Neurai Information Processing Systems, D.Z. Anderson, Ed., Denver, CO, Amedcan Institute of Physics, pp. 564- 572, 1988.
E. van Keulen, S. Colak, H. Withhagen and H. Hegt, "Neural Network Hardware Performance Criteria," Proceedings of IEEE International Conference on Neural Networks (ICNN), pp. 1885-1 888, June 28-July 2, 1994.
J. von Neumann, The Computer and the Brain, New Haven: Yale University Press, 1958.
Eric Vittoz, "Analog VLSI foi Collective Computation," Proceedings of IEEE International Conference on Electronics, Circuits and Systems (ICECS), Vol. 2, pp. 3-6, Lisbon, Portugal, September 1998.
Eric Vittoz, "Analog VLSI Implementation of Neural Networks," in Hundbook of Neural Coniputation, Institute of Physics Pu blishing and Oxford University Press, USA, 1996.
Eric Vittoz, ''Micropower Techniques," in Design of VLSI Circuits for Telecommicnications und Signal Pmcesshg, Eâitors J Franca and Y. Tsividis, Prentice Hall, Englewood Cliffs, 1994.
University of Windsor
[9 11 K. Wagner and D. Psaltis, "Optical Neural Networks: An Introduction by the Feature Editors," Special Issue of Applied Optics, Vol. 32, No. 8, pp. L26 1-1263, March 1993.
[92] H.C.A.M. Withagen, "Reducing the Effect of Quantization by Weight Scaling," Proceedings of IEEE International Conference on Neural Networks (ICNN), pp. 2 128-2 130, June 28- July 2, 1994.
[93] Y. Xie and M.A. Jabri. "Analysis of the effects of quantization in multilayer neural networks using a statistical model," IEEE Transactions on Neural Networks, Vol. 3. No. 2, pp. 334-338, March 1992.
[94] 0. Yadid-Pecht, et al., "A Random Access Photo-diode Amy for Intelligent Image Capture," IEEE Transactions Electron Devices, Vol. 38, No. 8, pp. 17724780. August 199 1.
[95] A.K. Yamamura, Neural Network C o n t d and an Optoelectmnic lmplernentation of a Multilayer Feedfoward Neural Nehvork, Ph.D Dissertation, California Institute of Technology, 1992.
[96] N. Yazdi, M. Ahmadi, G.A. Jullien and M. Shridhar, "Pipelined Analog Multilayer Feedforward Neural Networks," Proceedings of EEE International Symposium on Circuits and Systems (ISCAS), Vol. 4, pp. 2768-2771, Chicago, IL, May 1993.
[97] N. Yazdi, M. Ahmadi, G.A. Jullien and M. Shridhar, "A High-Dynamic Range CMOS Buffet Amplifier with High-Drive Capability," Proceedings of EEE International Symposium on Circuits and Systems (ISCAS), Vol. 5, pp. 2332-2335, San Diego, CA, May 1992.
[98] J.M. Zurada, Introduction to Artijicial Neural Systems, West Pub lis hing Company, 1992.
Appendix VLSZ Layouts und Fabrications
Figure A.1 Layout of a sub-neuron circuit
Figure A 3 A gmnp of Bve sub-neurons nith a common bias circuit
University of Windsor
Figure A.3 ' h o layouts for a 5-bit MDAC synapse with cascode transistors
Figure A.4 Layout of a 5-bit non-cascode MDAC synapse for 33V operation
University of Windsor
FipreA.5 Layout of a UnWeà Synapse Neumn (USN): a) with cascode MDAC, b) with non-cascode MDAC for 33V operation
Figure A.6 Layout of a 5-bit pamlfel-in paralld-out (Pm) weight register
University of Windsor
Figure A.7 WRRNR a test ehip containing distributcd neumns, MDACs and USNs (see page 40 for a microphotograph)
Figure A.8 WRNBS: 4-input template matcbing NNIC with optidlelectronic inputs (see page 67 for a dcrophotograph)
FigureA.11 WRNSS: I (see page 1
Figure A.12 Neural-based ph
Wta AuctoBs
Hormoz Djahanshahi was bom in 1964 in Tehran, Iran where he obtained his high school
diploma at the age of 16. He received B.Sc. degree (Hons.) and M.Sc. degree (Hons.)
from Tehran Polytechnic (Amir Kabir) University both in Electncal Engineering with a
major in Electronics. His Master's project, a Patient Monitoring System, was an applied
research in biomedical instrumentations. He punued his Master's work at Fajr
Microelectronics Co. when the system evolved from an engineering prototype to a
commercial product. His PhD thesis at the VLSI Research Group, University of Windsor,
Canada was in the area of VLSI implementation of hy bnd (analog-digital) neural networks
and smart optical sensors. The research lead to several conference and journal articles.
Towards the end of his thesis, he has been working as a Post Doctoral Fellow at the VLSI
Research Group, University of Toronto, where he has designed and published in the area
of high-speed (622MHz) Clock & Data Recovery, and Giga bit per second UO interface
circuits, both in low-voltage submicron CMOS.
Recommended