Similarity, kernels, and the fundamental constraints on ... · Similarity, kernels, and the fundamental constraints on cognition A survey paper Authors: Reza Shahbazi, Rajeev Raizada,

Similarity, kernels, and the fundamental constraints on

cognitionA survey paper

Authors: Reza Shahbazi, Rajeev Raizada, Shimon Edelman

Presented by Maksym Del

([email protected])

2017

Paper outline

• 4 fundamental constrains on any learning system

• Kernels

• Addressing constrains with kernels

• Application to neuroscience

* machine learning * neuroscience

4 Fundamental Constraintson any learning system (animal, human, robot etc.)

We expect learning system to

• 1) Measure external stimuli• Gather data from human eyes, ears, robot sensors etc.



• 2) Generalize from familiar to similar unfamiliar stimuli• The tiger is dangerous -> the lion is dangerous



• 2) Generalize from familiar to unfamiliar stimuli• The tiger is dangerous -> the lion is dangerous

• 3) Deal with high dimensional data• A lot of features from many sensors



• 2) Generalize from familiar to unfamiliar stimuli• The tiger is dangerous -> the lion is dangerous

• 3) Deal with high dimensional data• A lot of features from many sensors; sparse data

• 4) Solve complex tasks• Puzzles, regression / classification tasks, etc.

Constraints

• 1) Measurement:

• 2) Similarity:

• 3) Dimensionality:

• 4) Complexity:

Constraints

• 1) Measurement: dealing with raw data

• 2) Similarity:


• 4) Complexity:

Constraints


• 2) Similarity: similarity estimation to generalize


• 4) Complexity:

Constraints



• 3) Dimensionality: need to reduce it

• 4) Complexity:

Constraints



• 3) Dimensionality: need to reduce it

• 4) Complexity: map problem into another space to make it simpler

Constraints

So far we defined 4 fundamental constraints on any learning system:

• Measurement constraint

• Similarity constraint

• Dimensionality constraint

• Complexity constraint

Paper outline


• Kernels




Kernelsin feature space mapping and as a similarity measure

Lets say we have a linear tool



And linearly separable features

Image: https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM

We are fine


Non-linear data?


Make it linearly separable

• Map the feature space to higher dimension

• Separate with linear tool

Non-linear data?


Lets formalize

• x, y are n dimensional inputs

Lets formalize


• f(.) is a map from n-dimension to m-dimension space (m > n)

Lets formalize



• x and y often occur in the form of dot product: <x, y>

Lets formalize




• To map x and y to higher dimension we have to compute <f(x), f(y)>

Kernel





• However, computing f(x) and f(y) might be hard

Kernel






• Note, that <f(x), f(y)> = r

Kernel







• So lets introduce K(x, y) = r

Kernel







• So lets introduce K(x, y) = r

• K(x, y) is called kernel

Simple example

https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM

Simple example

• x = (x1, x2, x3); y = (y1, y2, y3)

• f(x) = (x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3)


Simple example

• x = (x1, x2, x3); y = (y1, y2, y3)


• x = (1, 2, 3); y = (4, 5, 6)


Simple example

• x = (x1, x2, x3); y = (y1, y2, y3)


• x = (1, 2, 3); y = (4, 5, 6)

• f(x) = (1, 2, 3, 2, 4, 6, 3, 6, 9)

• f(y) = (16, 20, 24, 20, 25, 30, 24, 30, 36)


Simple example

• x = (x1, x2, x3); y = (y1, y2, y3)


• x = (1, 2, 3); y = (4, 5, 6)

• f(x) = (1, 2, 3, 2, 4, 6, 3, 6, 9)

• f(y) = (16, 20, 24, 20, 25, 30, 24, 30, 36)

• <f(x), f(y)> = 16 + 40 + 72 + 40 + 100+ 180 + 72 + 180 + 324 = 1024


Simple example

• x = (x1, x2, x3); y = (y1, y2, y3)


• x = (1, 2, 3); y = (4, 5, 6)

• f(x) = (1, 2, 3, 2, 4, 6, 3, 6, 9)

• f(y) = (16, 20, 24, 20, 25, 30, 24, 30, 36)

• <f(x), f(y)> = 16 + 40 + 72 + 40 + 100+ 180 + 72 + 180 + 324 = 1024

• K(x, y ) = (<x, y>)^2


Simple example

• x = (x1, x2, x3); y = (y1, y2, y3)


• x = (1, 2, 3); y = (4, 5, 6)

• f(x) = (1, 2, 3, 2, 4, 6, 3, 6, 9)

• f(y) = (16, 20, 24, 20, 25, 30, 24, 30, 36)

• <f(x), f(y)> = 16 + 40 + 72 + 40 + 100+ 180 + 72 + 180 + 324 = 1024

• K(x, y ) = (<x, y>)^2

• K(x, y) = (4 + 10 + 18 ) ^2 = 32^2 = 1024


2 views on kernels

• Kernel as an implicit feature map (aka Kernel trick)

• ?

Another view on kernel

• We work with a dot product: <x,y>



• But what is dot product <x,y>?




• It is a measure of overlap between 2 vectors





• Measure of overlap = measure of similarity






• So what is kernel K = <f(x),f(y)>?






• So what is kernel K = <f(x),f(y)>?

• It is a more high-level measure of similarity!

2 views on kernels

• Kernel as an implicit feature map (aka Kernel trick)

• Kernel as a measure of similarity

Paper outline


• Kernels




Addressing constrains with the kernels

Constraints


Constraints






Kernels

And inferred 2 definitions of kernels:

Kernels

And inferred 2 definitions of kernels:

• Implicit feature map (kernel trick)

• A kernel is a measure of similarity

Lets map constrains with views on kernels











• Direct access to information: K(x,y)








• Kernel as similarity measure









• Kernel trick









• Kernel trick

• Kernel trick


Paper outline


• Kernels

• Addressing constrains the kernels



Application to neurosciencesimilarity and kernels in animal behavior

Example learning system

Tasks that can help an animal survive


Task

• Deciding on an appropriate response to a novel stimulus

• Veridical representation

• Dealing with noise and confounding factors

• Generalizing learned skills to new tasks

Example

• ‘‘Is this a dangerous animal?’’

• “Judging the similarity of a red apple to a green apple”

• “Detecting a lion’s roar from a distance on a windy day”

• “Learning tree climbing can help rock climbing”



Task





Example







Task





Example







Task





Example







Task





Example





Strategies to solve a task


Task





Possible strategy

• Judge similarity to familiar examplesFind a decision boundary based on previous examplesDiscover and exploit structure within collected examplesQuantify output in terms of input

• Preserve pairwise distances

• Allow for variance

• Domain adaptation and transfer of learning



Task





Possible strategy







Task





Possible strategy







Task





Possible strategy







Task





Possible strategy





Corresponding ML technics


Possible strategy





Machine learning technique

• ???k-NN with kernel metricSVM, RBF networks

kPCA

Linear Regression, Gaussian processes

• MDS with kernel metric

• Regularization

• Hierarchical mixture models — not kernel based, deep convolutional networks — Implicitly kernel based



Possible strategy






• k-NN with kernel metricSVM, RBF networks

kPCA



• Regularization




Possible strategy







kPCA



• Regularization




Possible strategy







kPCA



• Regularization




Possible strategy







kPCA



• Regularization


Final mapping


Task







kPCA



• Regularization


Paper outline


• Kernels

• Addressing constrains the kernels



What we (authors) did?

• We defined 4 foundational constrains on any learning system

• We investigated a notion of kernel

• We mapped learning constraints to the kernel characteristics

• We showed that the kernel-based and similarity-based concepts are used in every-day cognitive agent (learning system) tasks (with example of animal survival)


























Take home neuroscience message

• The concept of kernel is that natural for the animal (human) cognitive behavior so we might assume that the brain uses something similar to kernel.


P.S.:

• https://www.youtube.com/watch?v=3liCbRZPrZA


https://www.youtube.com/watch?v=3liCbRZPrZA

Questions?Thank you for your attention

Documents

Similarity, kernels, and the fundamental constraints on ... · Similarity, kernels, and the fundamental constraints on cognition A survey paper Authors: Reza Shahbazi, Rajeev Raizada,