Upload
others
View
24
Download
0
Embed Size (px)
Citation preview
Similarity, kernels, and the fundamental constraints on
cognitionA survey paper
Authors: Reza Shahbazi, Rajeev Raizada, Shimon Edelman
Presented by Maksym Del
2017
Paper outline
• 4 fundamental constrains on any learning system
• Kernels
• Addressing constrains with kernels
• Application to neuroscience
* machine learning * neuroscience
4 Fundamental Constraintson any learning system (animal, human, robot etc.)
We expect learning system to
• 1) Measure external stimuli• Gather data from human eyes, ears, robot sensors etc.
We expect learning system to
• 1) Measure external stimuli• Gather data from human eyes, ears, robot sensors etc.
• 2) Generalize from familiar to similar unfamiliar stimuli• The tiger is dangerous -> the lion is dangerous
We expect learning system to
• 1) Measure external stimuli• Gather data from human eyes, ears, robot sensors etc.
• 2) Generalize from familiar to unfamiliar stimuli• The tiger is dangerous -> the lion is dangerous
• 3) Deal with high dimensional data• A lot of features from many sensors
We expect learning system to
• 1) Measure external stimuli• Gather data from human eyes, ears, robot sensors etc.
• 2) Generalize from familiar to unfamiliar stimuli• The tiger is dangerous -> the lion is dangerous
• 3) Deal with high dimensional data• A lot of features from many sensors; sparse data
• 4) Solve complex tasks• Puzzles, regression / classification tasks, etc.
Constraints
• 1) Measurement:
• 2) Similarity:
• 3) Dimensionality:
• 4) Complexity:
Constraints
• 1) Measurement: dealing with raw data
• 2) Similarity:
• 3) Dimensionality:
• 4) Complexity:
Constraints
• 1) Measurement: dealing with raw data
• 2) Similarity: similarity estimation to generalize
• 3) Dimensionality:
• 4) Complexity:
Constraints
• 1) Measurement: dealing with raw data
• 2) Similarity: similarity estimation to generalize
• 3) Dimensionality: need to reduce it
• 4) Complexity:
Constraints
• 1) Measurement: dealing with raw data
• 2) Similarity: similarity estimation to generalize
• 3) Dimensionality: need to reduce it
• 4) Complexity: map problem into another space to make it simpler
Constraints
So far we defined 4 fundamental constraints on any learning system:
• Measurement constraint
• Similarity constraint
• Dimensionality constraint
• Complexity constraint
Paper outline
• 4 fundamental constrains on any learning system
• Kernels
• Addressing constrains with kernels
• Application to neuroscience
* machine learning * neuroscience
Kernelsin feature space mapping and as a similarity measure
Lets say we have a linear tool
Lets say we have a linear tool
Lets say we have a linear tool
And linearly separable features
Image: https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
We are fine
Image: https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Non-linear data?
Image: https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Make it linearly separable
• Map the feature space to higher dimension
• Separate with linear tool
Non-linear data?
Image: https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Lets formalize
• x, y are n dimensional inputs
Lets formalize
• x, y are n dimensional inputs
• f(.) is a map from n-dimension to m-dimension space (m > n)
Lets formalize
• x, y are n dimensional inputs
• f(.) is a map from n-dimension to m-dimension space (m > n)
• x and y often occur in the form of dot product: <x, y>
Lets formalize
• x, y are n dimensional inputs
• f(.) is a map from n-dimension to m-dimension space (m > n)
• x and y often occur in the form of dot product: <x, y>
• To map x and y to higher dimension we have to compute <f(x), f(y)>
Kernel
• x, y are n dimensional inputs
• f(.) is a map from n-dimension to m-dimension space (m > n)
• x and y often occur in the form of dot product: <x, y>
• To map x and y to higher dimension we have to compute <f(x), f(y)>
• However, computing f(x) and f(y) might be hard
Kernel
• x, y are n dimensional inputs
• f(.) is a map from n-dimension to m-dimension space (m > n)
• x and y often occur in the form of dot product: <x, y>
• To map x and y to higher dimension we have to compute <f(x), f(y)>
• However, computing f(x) and f(y) might be hard
• Note, that <f(x), f(y)> = r
Kernel
• x, y are n dimensional inputs
• f(.) is a map from n-dimension to m-dimension space (m > n)
• x and y often occur in the form of dot product: <x, y>
• To map x and y to higher dimension we have to compute <f(x), f(y)>
• However, computing f(x) and f(y) might be hard
• Note, that <f(x), f(y)> = r
• So lets introduce K(x, y) = r
Kernel
• x, y are n dimensional inputs
• f(.) is a map from n-dimension to m-dimension space (m > n)
• x and y often occur in the form of dot product: <x, y>
• To map x and y to higher dimension we have to compute <f(x), f(y)>
• However, computing f(x) and f(y) might be hard
• Note, that <f(x), f(y)> = r
• So lets introduce K(x, y) = r
• K(x, y) is called kernel
Simple example
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Simple example
• x = (x1, x2, x3); y = (y1, y2, y3)
• f(x) = (x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3)
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Simple example
• x = (x1, x2, x3); y = (y1, y2, y3)
• f(x) = (x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3)
• x = (1, 2, 3); y = (4, 5, 6)
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Simple example
• x = (x1, x2, x3); y = (y1, y2, y3)
• f(x) = (x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3)
• x = (1, 2, 3); y = (4, 5, 6)
• f(x) = (1, 2, 3, 2, 4, 6, 3, 6, 9)
• f(y) = (16, 20, 24, 20, 25, 30, 24, 30, 36)
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Simple example
• x = (x1, x2, x3); y = (y1, y2, y3)
• f(x) = (x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3)
• x = (1, 2, 3); y = (4, 5, 6)
• f(x) = (1, 2, 3, 2, 4, 6, 3, 6, 9)
• f(y) = (16, 20, 24, 20, 25, 30, 24, 30, 36)
• <f(x), f(y)> = 16 + 40 + 72 + 40 + 100+ 180 + 72 + 180 + 324 = 1024
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Simple example
• x = (x1, x2, x3); y = (y1, y2, y3)
• f(x) = (x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3)
• x = (1, 2, 3); y = (4, 5, 6)
• f(x) = (1, 2, 3, 2, 4, 6, 3, 6, 9)
• f(y) = (16, 20, 24, 20, 25, 30, 24, 30, 36)
• <f(x), f(y)> = 16 + 40 + 72 + 40 + 100+ 180 + 72 + 180 + 324 = 1024
• K(x, y ) = (<x, y>)^2
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Simple example
• x = (x1, x2, x3); y = (y1, y2, y3)
• f(x) = (x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3)
• x = (1, 2, 3); y = (4, 5, 6)
• f(x) = (1, 2, 3, 2, 4, 6, 3, 6, 9)
• f(y) = (16, 20, 24, 20, 25, 30, 24, 30, 36)
• <f(x), f(y)> = 16 + 40 + 72 + 40 + 100+ 180 + 72 + 180 + 324 = 1024
• K(x, y ) = (<x, y>)^2
• K(x, y) = (4 + 10 + 18 ) ^2 = 32^2 = 1024
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
2 views on kernels
• Kernel as an implicit feature map (aka Kernel trick)
• ?
Another view on kernel
• We work with a dot product: <x,y>
Another view on kernel
• We work with a dot product: <x,y>
• But what is dot product <x,y>?
Another view on kernel
• We work with a dot product: <x,y>
• But what is dot product <x,y>?
• It is a measure of overlap between 2 vectors
Another view on kernel
• We work with a dot product: <x,y>
• But what is dot product <x,y>?
• It is a measure of overlap between 2 vectors
• Measure of overlap = measure of similarity
Another view on kernel
• We work with a dot product: <x,y>
• But what is dot product <x,y>?
• It is a measure of overlap between 2 vectors
• Measure of overlap = measure of similarity
• So what is kernel K = <f(x),f(y)>?
Another view on kernel
• We work with a dot product: <x,y>
• But what is dot product <x,y>?
• It is a measure of overlap between 2 vectors
• Measure of overlap = measure of similarity
• So what is kernel K = <f(x),f(y)>?
• It is a more high-level measure of similarity!
2 views on kernels
• Kernel as an implicit feature map (aka Kernel trick)
• Kernel as a measure of similarity
Paper outline
• 4 fundamental constrains on any learning system
• Kernels
• Addressing constrains with kernels
• Application to neuroscience
* machine learning * neuroscience
Addressing constrains with the kernels
Constraints
So far we defined 4 fundamental constraints on any learning system:
Constraints
So far we defined 4 fundamental constraints on any learning system:
• Measurement constraint
• Similarity constraint
• Dimensionality constraint
• Complexity constraint
Kernels
And inferred 2 definitions of kernels:
Kernels
And inferred 2 definitions of kernels:
• Implicit feature map (kernel trick)
• A kernel is a measure of similarity
Lets map constrains with views on kernels
• Measurement constraint
• Similarity constraint
• Dimensionality constraint
• Complexity constraint
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Lets map constrains with views on kernels
• Measurement constraint
• Similarity constraint
• Dimensionality constraint
• Complexity constraint
• Direct access to information: K(x,y)
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Lets map constrains with views on kernels
• Measurement constraint
• Similarity constraint
• Dimensionality constraint
• Complexity constraint
• Direct access to information: K(x,y)
• Kernel as similarity measure
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Lets map constrains with views on kernels
• Measurement constraint
• Similarity constraint
• Dimensionality constraint
• Complexity constraint
• Direct access to information: K(x,y)
• Kernel as similarity measure
• Kernel trick
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Lets map constrains with views on kernels
• Measurement constraint
• Similarity constraint
• Dimensionality constraint
• Complexity constraint
• Direct access to information: K(x,y)
• Kernel as similarity measure
• Kernel trick
• Kernel trick
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Paper outline
• 4 fundamental constrains on any learning system
• Kernels
• Addressing constrains the kernels
• Application to neuroscience
* machine learning * neuroscience
Application to neurosciencesimilarity and kernels in animal behavior
Example learning system
Tasks that can help an animal survive
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Task
• Deciding on an appropriate response to a novel stimulus
• Veridical representation
• Dealing with noise and confounding factors
• Generalizing learned skills to new tasks
Example
• ‘‘Is this a dangerous animal?’’
• “Judging the similarity of a red apple to a green apple”
• “Detecting a lion’s roar from a distance on a windy day”
• “Learning tree climbing can help rock climbing”
Tasks that can help an animal survive
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Task
• Deciding on an appropriate response to a novel stimulus
• Veridical representation
• Dealing with noise and confounding factors
• Generalizing learned skills to new tasks
Example
• ‘‘Is this a dangerous animal?’’
• “Judging the similarity of a red apple to a green apple”
• “Detecting a lion’s roar from a distance on a windy day”
• “Learning tree climbing can help rock climbing”
Tasks that can help an animal survive
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Task
• Deciding on an appropriate response to a novel stimulus
• Veridical representation
• Dealing with noise and confounding factors
• Generalizing learned skills to new tasks
Example
• ‘‘Is this a dangerous animal?’’
• “Judging the similarity of a red apple to a green apple”
• “Detecting a lion’s roar from a distance on a windy day”
• “Learning tree climbing can help rock climbing”
Tasks that can help an animal survive
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Task
• Deciding on an appropriate response to a novel stimulus
• Veridical representation
• Dealing with noise and confounding factors
• Generalizing learned skills to new tasks
Example
• ‘‘Is this a dangerous animal?’’
• “Judging the similarity of a red apple to a green apple”
• “Detecting a lion’s roar from a distance on a windy day”
• “Learning tree climbing can help rock climbing”
Tasks that can help an animal survive
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Task
• Deciding on an appropriate response to a novel stimulus
• Veridical representation
• Dealing with noise and confounding factors
• Generalizing learned skills to new tasks
Example
• ‘‘Is this a dangerous animal?’’
• “Judging the similarity of a red apple to a green apple”
• “Detecting a lion’s roar from a distance on a windy day”
• “Learning tree climbing can help rock climbing”
Strategies to solve a task
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Task
• Deciding on an appropriate response to a novel stimulus
• Veridical representation
• Dealing with noise and confounding factors
• Generalizing learned skills to new tasks
Possible strategy
• Judge similarity to familiar examplesFind a decision boundary based on previous examplesDiscover and exploit structure within collected examplesQuantify output in terms of input
• Preserve pairwise distances
• Allow for variance
• Domain adaptation and transfer of learning
Strategies to solve a task
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Task
• Deciding on an appropriate response to a novel stimulus
• Veridical representation
• Dealing with noise and confounding factors
• Generalizing learned skills to new tasks
Possible strategy
• Judge similarity to familiar examplesFind a decision boundary based on previous examplesDiscover and exploit structure within collected examplesQuantify output in terms of input
• Preserve pairwise distances
• Allow for variance
• Domain adaptation and transfer of learning
Strategies to solve a task
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Task
• Deciding on an appropriate response to a novel stimulus
• Veridical representation
• Dealing with noise and confounding factors
• Generalizing learned skills to new tasks
Possible strategy
• Judge similarity to familiar examplesFind a decision boundary based on previous examplesDiscover and exploit structure within collected examplesQuantify output in terms of input
• Preserve pairwise distances
• Allow for variance
• Domain adaptation and transfer of learning
Strategies to solve a task
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Task
• Deciding on an appropriate response to a novel stimulus
• Veridical representation
• Dealing with noise and confounding factors
• Generalizing learned skills to new tasks
Possible strategy
• Judge similarity to familiar examplesFind a decision boundary based on previous examplesDiscover and exploit structure within collected examplesQuantify output in terms of input
• Preserve pairwise distances
• Allow for variance
• Domain adaptation and transfer of learning
Strategies to solve a task
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Task
• Deciding on an appropriate response to a novel stimulus
• Veridical representation
• Dealing with noise and confounding factors
• Generalizing learned skills to new tasks
Possible strategy
• Judge similarity to familiar examplesFind a decision boundary based on previous examplesDiscover and exploit structure within collected examplesQuantify output in terms of input
• Preserve pairwise distances
• Allow for variance
• Domain adaptation and transfer of learning
Corresponding ML technics
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Possible strategy
• Judge similarity to familiar examplesFind a decision boundary based on previous examplesDiscover and exploit structure within collected examplesQuantify output in terms of input
• Preserve pairwise distances
• Allow for variance
• Domain adaptation and transfer of learning
Machine learning technique
• ???k-NN with kernel metricSVM, RBF networks
kPCA
Linear Regression, Gaussian processes
• MDS with kernel metric
• Regularization
• Hierarchical mixture models — not kernel based, deep convolutional networks — Implicitly kernel based
Corresponding ML technics
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Possible strategy
• Judge similarity to familiar examplesFind a decision boundary based on previous examplesDiscover and exploit structure within collected examplesQuantify output in terms of input
• Preserve pairwise distances
• Allow for variance
• Domain adaptation and transfer of learning
Machine learning technique
• k-NN with kernel metricSVM, RBF networks
kPCA
Linear Regression, Gaussian processes
• MDS with kernel metric
• Regularization
• Hierarchical mixture models — not kernel based, deep convolutional networks — Implicitly kernel based
Corresponding ML technics
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Possible strategy
• Judge similarity to familiar examplesFind a decision boundary based on previous examplesDiscover and exploit structure within collected examplesQuantify output in terms of input
• Preserve pairwise distances
• Allow for variance
• Domain adaptation and transfer of learning
Machine learning technique
• k-NN with kernel metricSVM, RBF networks
kPCA
Linear Regression, Gaussian processes
• MDS with kernel metric
• Regularization
• Hierarchical mixture models — not kernel based, deep convolutional networks — Implicitly kernel based
Corresponding ML technics
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Possible strategy
• Judge similarity to familiar examplesFind a decision boundary based on previous examplesDiscover and exploit structure within collected examplesQuantify output in terms of input
• Preserve pairwise distances
• Allow for variance
• Domain adaptation and transfer of learning
Machine learning technique
• k-NN with kernel metricSVM, RBF networks
kPCA
Linear Regression, Gaussian processes
• MDS with kernel metric
• Regularization
• Hierarchical mixture models — not kernel based, deep convolutional networks — Implicitly kernel based
Corresponding ML technics
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Possible strategy
• Judge similarity to familiar examplesFind a decision boundary based on previous examplesDiscover and exploit structure within collected examplesQuantify output in terms of input
• Preserve pairwise distances
• Allow for variance
• Domain adaptation and transfer of learning
Machine learning technique
• k-NN with kernel metricSVM, RBF networks
kPCA
Linear Regression, Gaussian processes
• MDS with kernel metric
• Regularization
• Hierarchical mixture models — not kernel based, deep convolutional networks — Implicitly kernel based
Final mapping
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Task
• Deciding on an appropriate response to a novel stimulus
• Veridical representation
• Dealing with noise and confounding factors
• Generalizing learned skills to new tasks
Machine learning technique
• k-NN with kernel metricSVM, RBF networks
kPCA
Linear Regression, Gaussian processes
• MDS with kernel metric
• Regularization
• Hierarchical mixture models — not kernel based, deep convolutional networks — Implicitly kernel based
Paper outline
• 4 fundamental constrains on any learning system
• Kernels
• Addressing constrains the kernels
• Application to neuroscience
* machine learning * neuroscience
What we (authors) did?
• We defined 4 foundational constrains on any learning system
• We investigated a notion of kernel
• We mapped learning constraints to the kernel characteristics
• We showed that the kernel-based and similarity-based concepts are used in every-day cognitive agent (learning system) tasks (with example of animal survival)
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
What we (authors) did?
• We defined 4 foundational constrains on any learning system
• We investigated a notion of kernel
• We mapped learning constraints to the kernel characteristics
• We showed that the kernel-based and similarity-based concepts are used in every-day cognitive agent (learning system) tasks (with example of animal survival)
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
What we (authors) did?
• We defined 4 foundational constrains on any learning system
• We investigated a notion of kernel
• We mapped learning constraints to the kernel characteristics
• We showed that the kernel-based and similarity-based concepts are used in every-day cognitive agent (learning system) tasks (with example of animal survival)
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
What we (authors) did?
• We defined 4 foundational constrains on any learning system
• We investigated a notion of kernel
• We mapped learning constraints to the kernel characteristics
• We showed that the kernel-based and similarity-based concepts are used in every-day cognitive agent (learning system) tasks (with example of animal survival)
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
What we (authors) did?
• We defined 4 foundational constrains on any learning system
• We investigated a notion of kernel
• We mapped learning constraints to the kernel characteristics
• We showed that the kernel-based and similarity-based concepts are used in every-day cognitive agent (learning system) tasks (with example of animal survival)
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Take home neuroscience message
• The concept of kernel is that natural for the animal (human) cognitive behavior so we might assume that the brain uses something similar to kernel.
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
P.S.:
• https://www.youtube.com/watch?v=3liCbRZPrZA
https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM
Questions?Thank you for your attention