Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Fully Binary Neural Network Model and Optimized Hardware Architectures for Associative MemoriesPHILIPPE COUSSY, CYRILLE CHAVET, HUGUES NONO WOUAFO, and LAURA CONDE-CANENCIA
Presented by: Stefany Escobedo, Joshua Kallus, and Alyssa ScheskeMarch 26, 2020
Introduction
● The goal is to develop associative memories based on neural networks which can store
information and retrieve it in a similar manner as the human brain does
○ Robust against input noise
○ Constant retrieval time independent of the number of stored associations
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
GBNN Model
● Abstract neural network model
● Based on sparse clustered networks used to design
associative memories
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
GBNN Model
● N binary neurons
● C equally-partitioned clusters
● L = N/C neurons per cluster
● Each cluster is associated through one of its neurons
with a portion of an input message
● m message of K bits
● X = K/C = log_2(L) length of each cluster submessage
● Clique: set of of activated neurons that are connected
to each other
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
GBNN Model
● Learns by memorizing that the set of neurons that
constitute the input message are connected to each
other and form a clique
● Retrieves by detecting which neuron is the most
“stimulated” ○ Scoring step using Eq. 1○ Winner Takes All (WTA) step using Eq. 2
(1)
(2)
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
GBNN Model: HW Architecture
● Fully parallel HW implementation
● Modules○ Decoding○ Learning (memory)○ Computing
● Crossbar network dedicated to interchanges of neuron values
between clusters
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
GBNN Model: HW Architecture
● Learning process○ Cluster receives K-bit binary word○ Decoding module splits word in C-subwords (C clusters)○ Subword is used to determine which neuron must be activated
■ Remaining subwords used to determine which neurons must be connected to locally activated neuron
○ Memory is updated with the selected weights to store the clique
● Retrieval process○ Scoring step is processed○ WTA step elects a neuron or group of neurons○ Local neuron values are updated with new information○ Info is broadcasted to all distant neurons
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
GBNN Model: Discussion
● Advantage○ Strongly enhances performance of associative memories compared to Hopfield networks
● Disadvantage○ Complex hardware architectures whose area and timing performances do not scale well
● Further optimizations○ Transformation into a full binary model to simplify scoring and removing WTA (area reduction)○ Memorize half of the synaptic weights to reduce # of storage elements & cost of learning logic○ Serialize communications (area reduction)
● Overall goal○ Ease the process realized by the neurons○ Optimize hardware implementation○ Keep functionality and performance of the original model
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
Proposed Simplified Neural Network
The optimizations proposed include the following:
● Fully binary semantics vs arithmetical-integer semantics● Reduced memory complexity● Serialized communications
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
Fully Binary Semantics
Replacing all arithmetical-integer computations with logical equations allows for removing the winner takes all step and achieves the same performance as the enhanced GBNN model
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
Unanimous Vote
● A neuron ni,j
is active in a given cluster if at least one active neuron in each other active cluster (distant active neurons), indicates that it is connected with neuron n
i,j
● This changes how the decoding module works and enables removal of the WTA step. Values of neurons can be calculated with only logical equations now.
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
Reduced Memory
Synaptic weights are stored which represent connections between neurons and others in distant clusters. The original GBNN model calculates and stores redundant information which can be optimized out to save space with no performance cost
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
Serialized Communications
In the fully parallel GBNN design a very large number of wires and logic is needed to connect every node to every other node. Serializing data transfers offers several benefits:
● Improve clock frequencies● Reduce area significantly● Lower power consumption
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
Serialization Implementation
Cluster Based:
● Clusters take turns to broadcast the value of all their neurons● Takes C (# clusters) cycles to complete
Neuron Based:
● Clusters broadcast concurrently the values of one of their neurons● Takes L=N/C cycles to complete
Serialization: Hardware Implementation
● Steering logic for synaptic weights has large overhead in multiplexers
● Area cost of this design is high
Serialization: Hardware Implementation
● Flip Flop ring buffer logic● Requires only one MUX
instead of L-1 2:1 MUXes● Can be used with either
neuron based or cluster based serialization
Experiments
● Performance Analysis
● Complexity Analysis
● Hardware Synthesis Analysis○ FPGA Target - Stratix IV FPGA Platform○ ASIC Target - Altera HardCopy Platform
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
Architecture Label
Original GBNN model V0
Fully binary model V1.0
Binary + triangular synaptic weight model
V1.1
Binary + cluster-based serialization
V1.2
Binary + neuron-based serialization
V1.3
Experiments
Proposed architecture performance
matches/superimposed original GBNN
architecture
Performance AnalysisComplexity AnalysisHardware Synthesis Analysis
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
Experiments
Controller resources decompose into:
decoding, memorizing and computing
tasks.
Fully binary model (V1.0) reduces the
total area by 50% from V0.
V1.1 reduces architecture complexity
by ⅓ of V0 (70% area reduction).
V1.2 and V1.3 reduce architecture
complexity by ⅙ (83% area reduction)
Performance AnalysisComplexity AnalysisHardware Synthesis Analysis
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
Performance AnalysisComplexity AnalysisHardware Synthesis Analysis
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
V0
V1.0
V1.1
V1.2
Experiments - Area
Largest improvement from the original V0 architecture to
triangular synaptic weight matrix V1.1 by 50% for all
configurations.
Performance AnalysisComplexity AnalysisHardware Synthesis Analysis
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
Largest improvement from the original V0
architecture to V1.2/3 by 87% area savings.
Experiments
Look-up Table (LUT) average area
reductions range from 62% for V1.0
and up to 86% for V1.2.
The larger the network, the more impactful the reductions!
Performance AnalysisComplexity AnalysisHardware Synthesis Analysis
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
Experiments - Clock Frequencies
Performance AnalysisComplexity AnalysisHardware Synthesis Analysis
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion
Conclusion & Comments
● Full binary computation strongly reduces the cost of the computation module
● Memory reduction limits the cost of both the memory and the decoding modules
● Serialization optimizes the computation and the decoding modules
● Future work: ○ Further optimize architectures for timing performance (not just area)
IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion