Why Normalizing v? - Michigan State · PDF fileWhy Normalizing v? ... ALVINN by Dean Pomerleau . BMI 871 2013 © Do not distribute 34 ... Slides-w1-d5.ppt Author: Juyang Weng Created

BMI 871 2013 1 © Do not distribute

Why Normalizing v?

 Why did we normalize v on the right side?

 Because we want the length of the left side to be the eigenvalue


CCI PCA Algorithm (1)


CCI PCA Algorithm (2)


PCA from the FERET Face Image Set

(a) Batch PCA; (b): One epoch; (c) 20 epochs


Quiz: PCA Properties Quiz: What is NOT true? A.   CCI PCA is an incremental learning

algorithm B.   CCI PCA does not need the covariance

matrix of the input space to compute the first k principal component vectors

C.   CCI PCA uses the properties that different eigenvectors are orthogonal when it computes the residual of the subspace spanned by the computed eigenvectors

D.   The eigenvectors from CCI PCA are feature vectors that emerge from the sensory experience

E.   The traditional batch PCA methods do not need to perform iterations


Quiz: PCA Properties Quiz: What is NOT true? A.   CCI PCA is an incremental learning

algorithm B.   CCI PCA does not need the covariance

matrix of the input space to compute the first k principal component vectors

C.   CCI PCA uses the properties that different eigenvectors are orthogonal when it computes the residual of the subspace spanned by the computed eigenvectors

D.   The eigenvectors from CCI PCA are feature vectors that emerge from the sensory experience

E.   The traditional batch PCA methods do not need to perform iterations


All Locations and All Scales

  Sensory array and motor array

  As input to each brain area   Every location   Every scale   Every shape

  Each area tries to do the best using its limited resource


Staggered Hierarchical Mapping (SHM)

0

483216

Output ofSHM

CognitiveMapping(HDR)

Zhang, Weng & Zhang ICDL 2002

Cascade: Input from last area Cascade is not what brain does! Mixed net instead! Song et al. IJCNN 2011: Each area connects directly with Sensory X and motor Z


PCA Net  PCA Net:

 Area 0: normalization: compute scatter vector  Area 1: PCA  Area 2: Nearest neighbor matching

 Reconstruction  Not neuromorphic: ordered PCA vectors


Convergence of IPCA Algorithms Oja's SGA Our IPCA


Other Eigenvectors Oja's SGA Our IPCA


IPCA Convergence Comparison

Convergence of the first 5 eigenvectors for 5632-dimension data (88-by-64 images): (a) Oja’s SGA (b) Sanger’s GHA (c) Proposed CCIPCA

(a)

(b)

(c)


Eigenvalues


Questions?


From CCI PCA to CCI LCA

 LCA: Lobe Component Analysis  Drop the orthogonality restriction   Interaction between neurons:

lateral inhibition (biologically supported)  Keep the CCI part


Neurons as Feature Detectors: The Lobe Component Model

  Biologically motivated:   Hebbian learning   lateral inhibition

  Partition the input space into c regions   X = R1 U R2 U ... U Rc

  Lobe component i: the principal component of the region Ri

Weng et al. WCCI 2006


Different Normalizations


Dual Optimality of CCI LCA   Spatial optimality leads to the best target:

Given the number of neurons (limited resource), the target of the synaptic weight vectors minimizes the representation error based on “observation” x:

  Temporal optimality leads to the best runner to the target: Given limited experience up to time t, find the best direction and step size for each t based on “observation” u = r x

Weng & Luciw TAMD vol. 1, no. 1, 2009


CCI LCA Algorithm (1)


CCI LCA Algorithm (2)


Plasticity Schedule

t1 t2 t

2

µ(t)

r = 10000


Effect of the Number of Neurons

Data set: NIST numerals


Independent Component Analysis (ICA)

 Formulation on chalk board: interactive


LCA is ICA for Super Gaussians

 Super Gaussians: large concentration at mean (default firing level)

 Natural images and cortical response are super Gaussians (Field 1994)

 LCA is ICA for super Gaussians: provable from the nature of lobe components


LCA and ICA (comparison)   ICA is the only framework

proposed as neuronal feature developer

  Two well-known ones:   FastICA (Oja et al.):

Type-1   Extended Infomax (Sejnowski et

al.): Type-2

  LCAs (ours): Type-5

  100-D Laplacian source   Superior performance due to the

optimality


Quiz: LCA Quiz: What is NOT true with the CCI LCA

algorithm? A.   It uses incremental learning algorithm B.   It uses parallel competition to sort out

the top-k winners C.   It uses the statistical efficiency

property so that it converges to the target quickly

D.   It requires that lobe component vectors to be mutually orthogonal

E.   It uses energy in the input in the sense that the retention rate and the learning rate always sum to 1


Quiz: LCA Quiz: What is NOT true with the CCI LCA

algorithm? A.   It uses incremental learning algorithm B.   It uses parallel competition to sort out

the top-k winners C.   It uses the statistical efficiency

property so that it converges to the target quickly

D.   It requires that lobe component vectors to be mutually orthogonal

E.   It uses energy in the input in the sense that the retention rate and the learning rate always sum to 1


Natural Images


Visual Filter Development with LCA


IC from Natural Images


Questions?


Ch 5 Properties of Representation


Conventional Artif. Neural Networks   Numeric representation   Learning as a regression problem   Feed forward network: state less   Recurrent network: with state, but within hidden

area   Supervised learning but few reinforcement learning   Most algorithms have an incremental version   System example: ALVINN by Dean Pomerleau


Orientation-Invariant Feature Detectors?


Not Really: You Missed Expression Details!


Quiz: LCA vs ICA Quiz: What is NOT True? A.   LCA is a type 4 algorithm B.   FastICA by Oja is type 1 C.   Extended Infomax by Sejnowesky

et al. is type 2 D.   LCA and ICA extract the same

features in general and that is why they can be compared

E.   A uniform distribution is sub-Gaussian


Quiz: LCA vs ICA Quiz: What is NOT True? A.   LCA is a type 4 algorithm B.   FastICA by Oja is type 1 C.   Extended Infomax by Sejnowesky

et al. is type 2 D.   LCA and ICA extract the same

features in general and that is why they can be compared

E.   A uniform distribution is sub-Gaussian


Feature Extraction Methods   Convolution   Gabor   Hopfield nets and Boltzmann machines   ART   Support Vector Machine (SVM)   PCA   Linear Discriminant Analysis (LDA)   Independent Component Analysis (ICA)   LCA


The Nearest Neighbor Rule


LCA Net

Documents

Why Normalizing v? - Michigan State · PDF fileWhy Normalizing v? ... ALVINN by Dean Pomerleau . BMI 871 2013 © Do not distribute 34 ... Slides-w1-d5.ppt Author: Juyang Weng Created