View
217
Download
0
Category
Preview:
Citation preview
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
1/63
Using Probability Theory
Curve Fitting &Multisensory Integration
Hannah Chen, Rebecca Roseman and Adam Rule | COGS 202 | 4.14.2
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
2/63
Overheard at Porters
“You’re optimizing for the wrong thing!”
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
3/63
What would you say?
What makes for a good model?
Best performance?Fewest assumptions?Most elegant?Coolest name?
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
4/63
Motivation Finding Patterns
Tools Probability Theory
Applications Curve FittingMultimodal Sensory Integra
Agenda
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
5/63
FINDING PATTERNSThe Motivation
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
6/63
You have data... now find patterns
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
7/63
You have data... now find patterns
Unsupervisedx = data (training)
y(x) = model
ClusteringDensity estimation
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
8/63
You have data... now find patterns
Unsupervisedx = data (training)
y(x) = model
ClusteringDensity estimation
Supervisedx = data (trainint = (target vectoy(x) = model
ClassificationRegression
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
9/63
You have data... now find patterns
Unsupervisedx = data (training)
y(x) = model
ClusteringDensity estimation
Supervisedx = data (trainint = (target vectoy(x) = model
ClassificationRegression
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
10/63
Important Questions
What kind of model is appropriate?What makes a model accurate?Can a model be too accurate?
What are our prior beliefs about the model?
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
11/63
PROBABILITY THEORYThe Tools
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
12/63
Properties of a distribution
x = eventp(x) = prob. of event
1.2.
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
13/63
Rules
Sum
Product
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
14/63
Example (Discrete)
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
15/63
Bayes Rule (review)
prior
evidence
likelihoodposterior
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
16/63
Probability Density vs. Mass Function
PDF PMF
continuous discrete
Intuition: How much probability f(x)concentrated near x per length dx, how
dense is probability near x
Intuition: Probability mass is sameinterpretation but from discrete point of
view: f(x) is probability for each point,whereas in PDF f(x) is probability for ainterval dx
Notation: p(x) Notation: P(x)
p(x) can be greater than 1, as long as
integral over entire interval is 1
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
17/63
Expectation & Covariance
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
18/63
CURVE FITTING Application
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
19/63
Observed Data: Given n points (x,y)
Curve Fitting
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
20/63
Observed Data: Given n points (x,t)Textbook Example): generate x uniformly from ra[0,1] and calculate target data t with sin(2πx) funcnoise with Gaussian distribution
Why?
Curve Fitting
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
21/63
Observed Data: Given n points (x,t)Textbook Example): generate x uniformly from ra[0,1] and calculate target d ata t with sin(2πx) funnoise with Gaussian distribution
Why?Real data sets typically have underlying regularity thatwe are trying to learn.
Curve Fitting
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
22/63
Observed Data: Given n points (x,y)
Goal: use observed data to predict new targetvalues t’ for new values of x’
Curve Fitting
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
23/63
Curve Fitting
Observed Data: Given n points (x,y)Can we fit a polynomial function with this data
What values for w and M fits this data well?
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
24/63
How to measure goodness of fit?
Minimize an error function
Sum of squares
of error
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
25/63
Overfitting
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
26/63
Combatting Overfitting
1. Increase data points
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
27/63
Combatting Overfitting
1. Increase data points
Data observations may have just been noisy
With more data, can see if data variation is due to noise or if is part ofunderlying relationship between observations
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
28/63
Combatting Overfitting
1. Increase data points
Data observations may have just been noisy
With more data, can see if data variation is due to noise or if is part ofunderlying relationship between observations
2. Regularization
Introduce penalty term
Trade off between good fit and penalty
Hyperparameter, , is input to model. Hyperparam will reduce overfittinin turn reducing variance and increasing bias (difference between
estimated and true target)
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
29/63
How to check for overfitting?
Training and validation subsetheuristic: data points > (multiple)(# parameters)
Training vs Testingdon’t touch test set until you are actuallyevaluating experiment!!
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
30/63
Cross-validation
1. Use portion, (S-1)/S, for training (white)
2. Assess performance (red)
3. Repeat for each run4. Average performance scores
4-fold cross-validation
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
31/63
Cross-validation
When to use?
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
32/63
Cross-validation
When to use?Validation set is small. If very little data, use S=number ofobserved data points
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
33/63
Cross-validation
When to use?Validation set is small. If very little data, use S=number ofobserved data points
Limitations:● Computationally expensive - # of training runs increase
by factor of S● Might have multiple complexity parameters - may lead
to training runs that is exponential in # of parameters
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
34/63
Alternative Approach:
Want an approach that depends only on size oftraining data rather than # of training runs
e.g.) Akaike information criterion (AIC),Bayesian information criterion (BIC, Sec 4.4)
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
35/63
Akaike Information Criterion (AIC)
Choose model with largest value for
best-fit loglikelihood
# of adjustablemodel parameters
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
36/63
Gaussian Distribution
This satisfies the two properties for a probabilitdensity! (what are they?)
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
37/63
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
38/63
Curve Fitting (ver. 1)
Assumption:1. Given value of x, corresponding target value t has aGaussian distribution with mean y(x, w)
2. Data { x,t } drawn independently from distribution:
likelihood =
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
39/63
Maximum Likelihood Estimation (MLE
Log likelihood:
What does maximizing the log likelihood looksimilar to?
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
40/63
Log likelihood:
What does maximizing the log likelihood looksimilar to?
wrt w:
Maximum Likelihood Estimation (MLE
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
41/63
Simpler example: use Gaussian of form
With Bayes’ can calculate posterior:
Maximum Posterior (MAP)
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
42/63
Maximum Posterior (MAP)
Determine w by maximizing posteriordistributionEquivalent to taking negative log of:
and combining the Gaussian & log likelihoodfunction from earlier...
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
43/63
Maximum Posterior (MAP)
Minimum of
What does this look like?
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
44/63
Maximum Posterior (MAP)
Minimum of
What does this look like?
What is regularization parameter?
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
45/63
Maximum Posterior (MAP)
Minimum of
What does this look like?
What is regularization parameter?
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
46/63
Bayesian Curve Fitting
However…
MLE and MAP are not fully Bayesian becausethey involve using point estimates for w
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
47/63
Curve Fitting (ver. 2)Given training data { x, t} and new point x,
predict the target value t
Assume parameters are fixed
Evaluate predictive distribution:
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
48/63
Bayesian Curve Fitting
Fully Bayesian approach requires integratingover all values of w, by applying sum andproduct rules of probability
B i C Fi i
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
49/63
Bayesian Curve Fittingmarginalization
posterior
This posterior is Gaussian and can beevaluated analytically (Sec 3.3)
B i C Fi i
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
50/63
Bayesian Curve Fitting
Predictive is Gaussian of form
with mean and variance and matrix
B i C Fi i
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
51/63
Bayesian Curve Fitting
Need to define
Mean and variance depend on x as a result ofmarginalization
(not the case in MLE/MAP
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
52/63
MULTIMODAL SENSORYINTEGRATION
Application
T M d l
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
53/63
Two ModelsVisual Capture
p r o
b a b i l i t y
location
v
a
T M d l
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
54/63
Two ModelsVisual Capture MLE
location
p r o
b a b i l i t y v
a
Proced re
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
55/63
Procedure
Final Result
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
56/63
Final Result
location
p r o b a b
i l i t y v
a
The Math (MLE Model)
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
57/63
The Math (MLE Model)
Likelihood
The Math (MLE Model)
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
58/63
The Math (MLE Model)
Likelihood
The Math (“Bayesian” Model)
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
59/63
The Math ( Bayesian Model)
Likelihood * Prior
Uniform Inverse Gamma
The Math (Empirical)
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
60/63
The Math (Empirical)
logistic function
likelihood
location estimates
weight constraint
Final Result
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
61/63
Final Result
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
62/63
QUESTIONS?
Two Models (Prediction)
8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting
63/63
Two Models (Prediction)
Visual Capture MLE
Visual Noise
V i s u a l
W e i g h
t
Visual Noise
V i s u a l
W e i g h
t
Recommended