CSEE4840 Embedded System Design - Columbia …sedwards/classes/2015/4840/reports/Embedd… · CSEE4840 Embedded System Design. Motivation for Hardware Spike Sorting •Motivated by

Online Unsupervised Spike Sorting Accelerator

Zhewei Jiang, Jamis Johnson

CSEE4840 Embedded System Design

Motivation for Hardware Spike Sorting

• Motivated by Transmission Cost High energy consumption

Throughput limitation

• Challenges Computation complexity

Feature extraction

Clustering

Memory Requirement

Feature Distribution

Cluster Centroid

ProsthesisDozens/Hundreds of Channels

SpikeSorting

32~64 * 8 bits/spike 3~4 bits/spike

Algorithm

Stage Algorithm Main Module(s)

Distribution 1D Kernel Density Estimation DISTR (main FSM)

Sample Criterion Laplacian DISTR (Laplace FSM)

TrainingBimodal prediction

& replacement policyCAM_CLUSTER

GRIDFIND

SortingClosest Grid

(grid granularity)

CAM_CLUSTERGRIDFIND

WTA

Algorithm - Kernel Density Estimation

• Approximate Bayesian Optimal Kernel Density Estimation

Marr wavelet as kernel

• Hardware cost assuming 8 bit resolution 2562 memory entry

Updates per spike ∝ kernel smoothing parameter


• Histogram Trivial kernel

• Hardware cost 2562 memory entry

One update per spike


• Histogram Trivial kernel

• Hardware cost 2562 memory entry

One update per spike

642 memory entryOne update per spike

Kernel Density Estimation – Trivial Case

• 2D Kernel Density Estimation Laplacian of Gaussian (Marr Wavelet)

• 1D Kernel Density Estimation Gaussian (or LoG, similar hardware

cost if precomputed)

• Histogram Smoothing with bin width

Laplacian

Convolution of histogram of negative Laplace operator [-1, 2, -1]

• Downward trend of 2nd

derivative is used to segment distribution space into informative and uninformative regions

• Only data in the shaded area are used for training

Informative Sample Vector Mask

• Informative sample vector mask reduces outliners sensitivity during training phase

• Certain features can share distribution space if no overlap can occur (e.g. peak & hyperpolarization)

Bayesian Classification

Copyright @ Yaroslav Bulatov

At this point, classification can be done through

Distance metric (e.g. Probabilistic Neural Network, MCK, …)

Boundary metric (Bayesian Boundary)

Top-Level Hardware Architecture

Spk_f[11:0]

Valid IDX[2] IDX[1] IDX[0]

MEM1: DISTRIBUTION

MEM2: INFO SAMPLE MEM3: GRID

Informative[63:0]

boundary[41:0]

Valid_grid[6:0]

ker

fin

laplace

clear

infm

Comb: GRIDFIND

Comb: WTA(min distance)

MEM4: Grid_index/Comb: grid match

Grid_idx[5:0]

V[7:0]D[1:0]x8

S/H Interface

Spk_f[11:0]

Valid IDX[2] IDX[1] IDX[0]

ker

fin

laplace

clear

HARDWARE

Leak

Update

Address

Writedata

ReaddataSOFTEWARE – Scheduling, commands

AVALON BUS

Clk, reset, read, write, readdata, writedata

Clk, reset, read, write

Modules [Distribution]

FEATURE

+1

wKDE

w

Boundary_thr

Laplacian

convolve

Boundary

is_f

CADDR

Ctrl

Modules [Distribution]

KDE[i]<<1

SUB SUBComparator

Boundary_thr

Comparator

A

D

Q1

Q4

ENB

Register

informative

Boundary

A

D

Q1

Q4

ENB

Register

A

D

Q1

Q4

ENB

Register

convolve

Modules [Training & Sorting]

FIS(Informative Features)

SEGS(Grid Boundary – Peak)

GRIDFIND

SEGS(Grid Boundary –

Hyperpolarization)

GRIDFIND

FEATURES

CAM_CLUSTER(Content Associative Memory –

Grid indexes as cluster identifier)

ATTR WTA

Output

Modules [GRIDFIND]

Feature

Boundary

A

H

Q1

Q8

ENB

Regis ter

A

H

Q1

Q8

ENB

Regis ter

A

H

Q1

Q8

ENB

Regis ter

A

H

Q1

Q8

ENB

Regis ter

Comparator

Comparator

Comparator

Comparator

Load_b

Q

QSE T

C LR

D

Q

QSE T

C LR

D

Q

QSE T

C LR

D

Q

QSE T

C LR

D

Index

Fp

Boundary_fp

Load_bp

Idx_p

Boundary_fh

Fh

Load_bh

Idx_h

Modules [CAM_CLUSTER]

A

D

Q1

Q4

ENB

Register

SUB

A

D

Q1

Q4

ENB

Register

SUBLoad_C

F_p

F_h

Idx_p Idx_h

3b 3b

ADD Match

Distance

Strong Cluster

Outlier

Weak Cluster

Free

Update &

(miss|leak)

Each cluster is associated with a bimodal updater, every matching spike strengthens the usefulness of the cluster, and periodic leakage is applied to weaken the cluster. False cluster caused by outliers is cleared this way.

Modules [WTA]

Cluster_idx[0]

Candidate [0]

DIST0

Candidate [2]

DIST1

Candidate [2]

DIST2

Candidate [3]

DIST3

Candidate [4]

DIST4

Candidate [5]

DIST5

DIST7

DIST6

Candidate [7]

Candidate [6]

W

W

W

W

W

W

W

Cluster_idx[1]

VALID

DB

DA

vA

vB

>sel

D

v

Design – Timing

Sequential Components:

FSM (for histogram generation in DISTR)

FSM (for Laplacian in DISTR)

Usefulness Updater (Bimodal SM for outliner handling in CAM_CLUSTER)

Posedge triggers State Transition: State <= nState& FSM Outputs (registered output) based on nState

Negedge triggers nState Transition: nState <= f(state)

Combinational circuit (in FSM) propagation for half cycle, guarantee no set-up/hold violation from clock jitter or skew

Design – Timing (Legacy)

Negedge triggered FF

Level sensitive Latch(non-clk enable)

Q

QSET

CLR

D

Q

QSET

CLR

D

L

Comb(bimodal)

WRITE CLK Retention Latch is used in a power-gating version of the design, with no retention flip-flop available. A non-retention FF trails a Latch to allow the design of a FSM that does not require extra cycle to read back state from L after power gating

Design – Timing (Legacy)

Negedge triggered FF

Level sensitive Latch(non-clk enable)

Q

QSET

CLR

D

Q

QSET

CLR

D

L

Comb(bimodal)

WRITE CLK

Q

QSET

CLR

D

Comb(bimodal)

WRITE

CLK

Design – Timing of Module DISTR

Internal FSM timing as previous slides

Ack (signal Fin) based interface, communication locked until software read out “Fin”.

Command Ker: [read Max & incr][write][read Min & incr][write][Fin]

special case when memory overflow:

[read & >> 1][write]…(64 entries)

Command Laplace: [read][read][read]…(64 entries)…[Fin]

Secondary FSM find informative sample regions based on

(entry[i]<<1) > (entry[i-1]+entry[i+1])

Design – Timing of Module GC

FIS(Informative Features)

SEGS

GRIDFIND

FEATURES

CAM_CLUSTER

ATTR WTA

Output

Q

QS ET

C LR

S

R

Q

QS ET

C LR

S

R

RTL Sim (Design Compiler)

Issues & Lessons Learned

• Testability Individual Modules

Overall Functionality

• Debugging

Documents

CSEE4840 Embedded System Design - Columbia …sedwards/classes/2015/4840/reports/Embedd… · CSEE4840 Embedded System Design. Motivation for Hardware Spike Sorting •Motivated by