Upload
nguyenkien
View
217
Download
0
Embed Size (px)
Citation preview
Online Unsupervised Spike Sorting Accelerator
Zhewei Jiang, Jamis Johnson
CSEE4840 Embedded System Design
Motivation for Hardware Spike Sorting
• Motivated by Transmission Cost High energy consumption
Throughput limitation
• Challenges Computation complexity
Feature extraction
Clustering
Memory Requirement
Feature Distribution
Cluster Centroid
ProsthesisDozens/Hundreds of Channels
SpikeSorting
32~64 * 8 bits/spike 3~4 bits/spike
Algorithm
Stage Algorithm Main Module(s)
Distribution 1D Kernel Density Estimation DISTR (main FSM)
Sample Criterion Laplacian DISTR (Laplace FSM)
TrainingBimodal prediction
& replacement policyCAM_CLUSTER
GRIDFIND
SortingClosest Grid
(grid granularity)
CAM_CLUSTERGRIDFIND
WTA
Algorithm - Kernel Density Estimation
• Approximate Bayesian Optimal Kernel Density Estimation
Marr wavelet as kernel
• Hardware cost assuming 8 bit resolution 2562 memory entry
Updates per spike ∝ kernel smoothing parameter
Algorithm - Kernel Density Estimation
• Histogram Trivial kernel
• Hardware cost 2562 memory entry
One update per spike
Algorithm - Kernel Density Estimation
• Histogram Trivial kernel
• Hardware cost 2562 memory entry
One update per spike
642 memory entryOne update per spike
Kernel Density Estimation – Trivial Case
• 2D Kernel Density Estimation Laplacian of Gaussian (Marr Wavelet)
• 1D Kernel Density Estimation Gaussian (or LoG, similar hardware
cost if precomputed)
• Histogram Smoothing with bin width
Laplacian
Convolution of histogram of negative Laplace operator [-1, 2, -1]
• Downward trend of 2nd
derivative is used to segment distribution space into informative and uninformative regions
• Only data in the shaded area are used for training
Informative Sample Vector Mask
• Informative sample vector mask reduces outliners sensitivity during training phase
• Certain features can share distribution space if no overlap can occur (e.g. peak & hyperpolarization)
Bayesian Classification
Copyright @ Yaroslav Bulatov
At this point, classification can be done through
Distance metric (e.g. Probabilistic Neural Network, MCK, …)
Boundary metric (Bayesian Boundary)
Top-Level Hardware Architecture
Spk_f[11:0]
Valid IDX[2] IDX[1] IDX[0]
MEM1: DISTRIBUTION
MEM2: INFO SAMPLE MEM3: GRID
Informative[63:0]
boundary[41:0]
Valid_grid[6:0]
ker
fin
laplace
clear
infm
Comb: GRIDFIND
Comb: WTA(min distance)
MEM4: Grid_index/Comb: grid match
Grid_idx[5:0]
V[7:0]D[1:0]x8
S/H Interface
Spk_f[11:0]
Valid IDX[2] IDX[1] IDX[0]
ker
fin
laplace
clear
HARDWARE
Leak
Update
Address
Writedata
ReaddataSOFTEWARE – Scheduling, commands
AVALON BUS
Clk, reset, read, write, readdata, writedata
Clk, reset, read, write
Modules [Distribution]
FEATURE
+1
wKDE
w
Boundary_thr
Laplacian
convolve
Boundary
is_f
CADDR
Ctrl
Modules [Distribution]
KDE[i]<<1
SUB SUBComparator
Boundary_thr
Comparator
A
D
Q1
Q4
ENB
Register
informative
Boundary
A
D
Q1
Q4
ENB
Register
A
D
Q1
Q4
ENB
Register
convolve
Modules [Training & Sorting]
FIS(Informative Features)
SEGS(Grid Boundary – Peak)
GRIDFIND
SEGS(Grid Boundary –
Hyperpolarization)
GRIDFIND
FEATURES
CAM_CLUSTER(Content Associative Memory –
Grid indexes as cluster identifier)
ATTR WTA
Output
Modules [GRIDFIND]
Feature
Boundary
A
H
Q1
Q8
ENB
Regis ter
A
H
Q1
Q8
ENB
Regis ter
A
H
Q1
Q8
ENB
Regis ter
A
H
Q1
Q8
ENB
Regis ter
Comparator
Comparator
Comparator
Comparator
Load_b
Q
QSE T
C LR
D
Q
QSE T
C LR
D
Q
QSE T
C LR
D
Q
QSE T
C LR
D
Index
Fp
Boundary_fp
Load_bp
Idx_p
Boundary_fh
Fh
Load_bh
Idx_h
Modules [CAM_CLUSTER]
A
D
Q1
Q4
ENB
Register
SUB
A
D
Q1
Q4
ENB
Register
SUBLoad_C
F_p
F_h
Idx_p Idx_h
3b 3b
ADD Match
Distance
Strong Cluster
Outlier
Weak Cluster
Free
Update &
(miss|leak)
Each cluster is associated with a bimodal updater, every matching spike strengthens the usefulness of the cluster, and periodic leakage is applied to weaken the cluster. False cluster caused by outliers is cleared this way.
Modules [WTA]
Cluster_idx[0]
Candidate [0]
DIST0
Candidate [2]
DIST1
Candidate [2]
DIST2
Candidate [3]
DIST3
Candidate [4]
DIST4
Candidate [5]
DIST5
DIST7
DIST6
Candidate [7]
Candidate [6]
W
W
W
W
W
W
W
Cluster_idx[1]
VALID
DB
DA
vA
vB
>sel
D
v
Design – Timing
Sequential Components:
FSM (for histogram generation in DISTR)
FSM (for Laplacian in DISTR)
Usefulness Updater (Bimodal SM for outliner handling in CAM_CLUSTER)
Posedge triggers State Transition: State <= nState& FSM Outputs (registered output) based on nState
Negedge triggers nState Transition: nState <= f(state)
Combinational circuit (in FSM) propagation for half cycle, guarantee no set-up/hold violation from clock jitter or skew
Design – Timing (Legacy)
Negedge triggered FF
Level sensitive Latch(non-clk enable)
Q
QSET
CLR
D
Q
QSET
CLR
D
L
Comb(bimodal)
WRITE CLK Retention Latch is used in a power-gating version of the design, with no retention flip-flop available. A non-retention FF trails a Latch to allow the design of a FSM that does not require extra cycle to read back state from L after power gating
Design – Timing (Legacy)
Negedge triggered FF
Level sensitive Latch(non-clk enable)
Q
QSET
CLR
D
Q
QSET
CLR
D
L
Comb(bimodal)
WRITE CLK
Q
QSET
CLR
D
Comb(bimodal)
WRITE
CLK
Design – Timing of Module DISTR
Internal FSM timing as previous slides
Ack (signal Fin) based interface, communication locked until software read out “Fin”.
Command Ker: [read Max & incr][write][read Min & incr][write][Fin]
special case when memory overflow:
[read & >> 1][write]…(64 entries)
Command Laplace: [read][read][read]…(64 entries)…[Fin]
Secondary FSM find informative sample regions based on
(entry[i]<<1) > (entry[i-1]+entry[i+1])
Design – Timing of Module GC
FIS(Informative Features)
SEGS
GRIDFIND
FEATURES
CAM_CLUSTER
ATTR WTA
Output
Q
QS ET
C LR
S
R
Q
QS ET
C LR
S
R
RTL Sim (Design Compiler)
Issues & Lessons Learned
• Testability Individual Modules
Overall Functionality
• Debugging