View
223
Download
1
Category
Tags:
Preview:
Citation preview
Soft Computing
Lecture 23
Hardware neural networks
14.12.2005 2
Kinds of hardware support of NN
• Using parallel general purpose hardware
• Using of signal processors
• Development of special hardware, based on VLSI, for implementation of NN
14.12.2005 3
Technologies for development of neural networks in VLSI
Kind of elements Advantages Disadvantages
Analog optical Opportunity of mass connections
Full technology of optical computing is absent
Analog electrical Simple concepts, fast processing of data
Hard technological requirements, sensitivity to defects and external influences, small exactness of computing, difficulties of implementation of mass connections
Digital electrical Advanced full technology, high exactness of computing, robustness to technological options
Complexity of schemes decisions, multitact execution of basic operations, difficulties of implementation of mass connections
Hybrid Analog acceleration of basic operation with digital interfaces with external devices, opportunity of optical commutation
Further technological developments are needed
14.12.2005 4
Some first neural networks in VLSI
Name(company-producer)
Kind of elements
Number of neurons and synapses
Number of multiplications with
summation per second
Silicon Retina (Synaptics) Analog 48 x 48 ?
ETANN (Intel) Analog 64 / 104 2 109
N64000 (Inova) Digital 64 / 105 9 108
MA-16 (Siemens) Digital 16 / 256 4 108
RN-200 (Ricoh) Hybrid 16 / 256 3 109
NeuroClassifier (Mesa Research Institute)
Hybrid 7 / 426 2 1010
14.12.2005 5
14.12.2005 6
Some neuro computers
Name, Producer Type Number of processor elements
Number of multiplication
s with summation per second
CNAPS/PC(Adaptive Solutions)
PC-accelerator card
2 CNAPS-1016 processors
(128 neuronsв)
2.5 10 9
CNAPS(Adaptive Solutions)
Neuro computer
8 CNAPS-1016 processors
(512 neurons)
10 10
SYNAPSE-1 (Siemens)
Neuro computer
8 MA-16 processors
(512 neurons)
3 10 9
14.12.2005 7
14.12.2005 8
Examples of Neurocomputers
• Three neurocomputer systems are listed:• The Adaptive Solutions CNAPS uses the Inova N64000
chip on on VME boards in a custom cabinet run from a UNIX host. Boards come with 1 to 4 chips and two boards can process the same network to give a total of 512 PE's. The software includes a C-language library, assembler, compiler, and a package of NN algorithms.
• Similarly, the HNC SNAP Neurocomputer comes with typically include 2 VME boards, each with four NAP 100 chips, providing 32 PE's total. The boards are controlled from a PC by the HNC Balboa accelerator card.
• The Siemens SYNAPSE-1 uses a systolic array of 8 MA-16 chips in a custom cabinet with a Unix host.
14.12.2005 9
Slice architecture• Following the bit slice concept of conventional digital processors, the neural
network slice chips provide building blocks to construct networks of arbitrary size and precision. Such chips typcially cost only about $50/chip, perform at moderate speeds, and are without on-chip learning.
• The Micro Devices MD1220 was probably the first commercial neural network chip. Each chip has eight neurons with hard-limit thresholds and eight 16-bit synapses with 1-bit inputs. With bit-serial multipliers in the synapse, the chip provides about 9MCPS. Bigger networks and networks with higher bit inputs can be constructed with multiple chips. A 16-bit accumulator limits the total number of inputs because of overflows.
• A similar chip is the Neuralogix NLX-420 Neural Processor Slice[9], which has has 16 processing elements (PE). A common 16-bit input is multiplied by a weight in each PE in parallel. New weights are read from off-chip. The 16-bit weights and inputs can be user selected as 16 1-bit, 4 4-bit, 2 8-bit or 1 16-bit value(s). The 16 neuron sums are multiplexed through a user-defined piece-wise continuous threshold function to produce a 16-bit output. Internal feedback allows for multi-layer networks. Multiple chips can build large networks .
• The Philips Lneuro 1.0 chip, which is designed to be easily interfaced to Transputers, also has 16-bit processing in which the neuron values can be interpreted as 8 2-bit, 4 4-bit, etc., sub-values. Unlike the NLX-420, there is a sizable (1kByte) on chip cache to hold weights. The transfer function is done off-chip, which allows for multiple chips to provide synapse-input products to the neurons to build very large networks.
14.12.2005 10
Multi-processor CHIPs• A far more elaborate approach is to put many small processors on a
chip. Two architectures dominate such designs: single instruction with multiple data (SIMD) and systolic arrays. For SIMD design, each processor executes the same instruction in parallel but on different data. In systolic arrays, a processor does one step of a calculation (always the same step) before passing it's result on to the next processor in a pipelined manner.
• SIMD chips include the Inova N64000 and the HNC 100 NAP. The Adaptive Solutions CNAPS systems uses the Inova N64000 to build a SIMD array. The chip contains 64 PE's, with each PE possessing a 9x16 bit integer multiplier, 32-bit accumulator, and 4KBytes of on-chip memory for weight storage. All chips execute the same instruction and common control and data buses allow for multiple chips to be combined. The Hecht-Nielson Computers 100 NAP (Neurocomputer Array Processor) contains only 4 PE's but each PE performs true 32-bit floating point arithmetic. Weights are stored in off-chip memory and multiple chips can be cascaded.
• A systolic array system can be built with the Siemens MA-16. The MA-16 provides for fast matrix-matrix operations (mult, sub, or add) of 4x4 matrices with 16-bit elements. The multipler outputs and accumulators have 48-bit precision. Weights are stored off-chip and neuron transfer functions are off-chip via lookup tables. Multiple chips can be cascaded.
14.12.2005 11
RBF functions:• RBF networks provide fast learning and straight-forward
interpretation. The comparison of input vectors to stored training vectors can be done quickly if non-Euclidian distances, such as the Manhatten block norm (sum of element differences), are calculated with no multiplication operations.
• Two commercial RBF products are now available: the IBM ZISC036 (Zero Instruction Set Computer) chip and the Nestor Ni1000 chip.
• The ZISC036 contains 36 prototype-vector neurons, where the vectors have 64 8-bit elements, and can be assigned to categories from 1 to 16383. Multiple chips can be easily cascaded to provide additional prototypes. The distance norm is selectable between Manhatten block and the largest element difference. The chip implements a Region of Influence learning algorithm using signum basis functions with radii of 0 to 16383. Recall is according to the ROI identification or via nearest neighbor readout. Recall processing takes for a 250k/sec pattern presentation rate.
• The Nestor Ni1000, developed jointly by Intel and Nestor, contains 1024 prototypes of 256 5-bit elements. The chip has two on-chip learning algorithms, RCE and PNN, and other algorithms can be microcoded. The processing rate is about 40k patterns/sec with a 40MHz clock.
14.12.2005 12
Other digital design• Some digital neural network chips don't quite fit into the
above three sub-categories.• Examples include the Micro Circuit Engineering MT19003
NISP Neural Instruction Set Processor and the Hitachi Wafer Scale Integration chips.
• The NISP is basically a very simple RISC processor with seven instructions, optimized for implementation of multi-layer networks, and loaded with small programs to direct the processing. Feed-forward processing reaches 40MCPS.
• At the other end of the complexity scale are the Hitachi Wafer Scale Integration chips. Both Hopfield and back-propagation wafers have been built. A neurocomputer with 8 of the back-prop wafers, each with 144 neurons, achieved 2.3GCUPS.
14.12.2005 13
NeuroMatrix® NM6403 RISC/DSP Microprocessor (Research Center Module, Russia)
• NeuroMatrix® NM6403 is a high performance dual-core microprocessor with combination of VLIW/SIMD architectures. The architecture includes two main units:– 32-bit RISC Core– 64-bit VECTOR co-processor to support vector operations with
elements of variable bit length (Patent US 6,539,368 B1).
There are two identical programmable interfaces to work with any memory types as well as two communication ports hardware compatible with TI DSP TMS320C4x which permit to build multi- processor systems.
14.12.2005 14
RISC-Core • 5-stage pipelined 32-bit RISC; • processor instructions are 32 and 64 bit wide (usually
two operations are executed by each instruction); • two address generation units, address space - 16 GB; • two 64 bit programmable interfaces with SRAM/DRAM
shared memory; • data format: 32-bit digit integers; • registers:
– 8 of 32 bit general purpose registers; – 8 of 32 bit address registers; – special control and state registers;
• two high speed I/O communication ports of a byte width hardware compatible with those of TMS320C4x.
14.12.2005 15
VECTOR co-processor • 1-64 bit word length of vector operands
and the products;
• data format: integer data packed into 64-bit blocks in the form of variable length words from 1 to 64 bits each;
• hardware support of vector-matrix or matrix-matrix multiplication;
• On-chip saturation functions;
• On-chip three 32*64 bit RAM blocks.
14.12.2005 16
Applications
• accelerators for PCs and workstations for: – neural net emulation; – signal processing; – image processing; – acceleration of vector and matrix calculations;
• telecommunications;
• embedded systems;
• basic block for building large super parallel computing systems.
14.12.2005 17
Performance • scalar operations:
– 40 MIPS; – 120 MOPS for 32 bit data;
• vector operations: – from 40 to 11.500+ MMAC (million multiplication and
accumulation per second);
• I/O and interfaces: – two programmable external memory 64 bit interfaces
have up to 800 MB/sec. throughput; – I/O communication ports up to 20 MB/sec. throughput
each.
14.12.2005 18
NeuroMatrix® MC431 Single-
DSP PCI Evaluation Board
• MC431 is a Single-DSP PCI board designed for software evaluation and system prototyping on NM6403 DSP. MC431 is a low cost solution that can be used for learning NeuroMatrix® architecture and Software Development Kit. It has one NM6403 DSP, 4MB SRAM and two communication ports.
• The NM6403 DSP has an access to two 2MB SRAM banks (one bank per each bus). One bank is accessible for reading/writing both from the processor and from PCI bus. The MC431 has two external communication ports for connecting input/output devices.
14.12.2005 19
NeuroMatrix® MC431 Single-DSP PCI Evaluation Board (2)
low cost high-performance vector-matrix engine - NM6403 DSP
• 4 MB SRAM • Two I/O communication ports hardware compatible with
TI 'C40 DSP • PCI host interface
14.12.2005 20
14.12.2005 21
Neural networks taget traffic
management Using a PCI-based video input board, image processor, and video display board, each with an on-board NM6403 neural-network processor, RC Module's imaging system can monitor and analyze traffic flow on six-lane highways in real time
Recommended