Upload
winfred-hopkins
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Probability and Statistics Probability and Statistics in Visionin Vision
ProbabilityProbability
• Objects not all the sameObjects not all the same– Many possible shapes for people, cars, Many possible shapes for people, cars,
……– Skin has different colorsSkin has different colors
• Measurements not all the sameMeasurements not all the same– NoiseNoise
• But some are more probable than But some are more probable than othersothers– Green skin not likelyGreen skin not likely
Probability and StatisticsProbability and Statistics
• Approach: probability distribution of Approach: probability distribution of expected objects, expected observationsexpected objects, expected observations
• Perform mid- to high-level vision tasks Perform mid- to high-level vision tasks by finding most likely model consistent by finding most likely model consistent with actual observationswith actual observations
• Often don’t know probability Often don’t know probability distributions – learn them from statistics distributions – learn them from statistics of training dataof training data
Concrete Example – Skin ColorConcrete Example – Skin Color
• Suppose you want to find pixels with Suppose you want to find pixels with the color of skinthe color of skin
• Step 1: learn likely distribution of skin Step 1: learn likely distribution of skin colors from (possibly hand-labeled) colors from (possibly hand-labeled) training datatraining data
ColorColor
Pro
bab
ility
Pro
bab
ility
Conditional ProbabilityConditional Probability
• This is the probability of observing a This is the probability of observing a given color given color givengiven thatthat the pixel is the pixel is skinskin
• Conditional probability p(color|skin)Conditional probability p(color|skin)
Skin Color IdentificationSkin Color Identification
• Step 2: given a new image, want to Step 2: given a new image, want to find whether each pixel corresponds find whether each pixel corresponds to skinto skin
• Maximum likelihood estimation: pixel Maximum likelihood estimation: pixel is skin iff p(skin|color) > p(not skin|is skin iff p(skin|color) > p(not skin|color)color)
• But this requires knowing p(skin|But this requires knowing p(skin|color) and we only have p(color|skin)color) and we only have p(color|skin)
Bayes’s RuleBayes’s Rule
• ““Inverting” a conditional probability:Inverting” a conditional probability: p(B|A) = p(A|B) p(B|A) = p(A|B) p(B) / p(A) p(B) / p(A)
• Therefore,Therefore,p(skin|color) = p(color|skin) p(skin|color) = p(color|skin) p(skin) / p(skin) / p(color)p(color)
• p(skin) is the p(skin) is the priorprior – knowledge of the – knowledge of the domaindomain
• p(skin|color) is the p(skin|color) is the posteriorposterior – what we want – what we want
• p(color) is a p(color) is a normalizationnormalization term term
PriorsPriors
• p(skin) = priorp(skin) = prior– Estimate from training dataEstimate from training data– Tunes “sensitivity” of skin detectorTunes “sensitivity” of skin detector– Can incorporate even more Can incorporate even more
information:information:e.g. are skin pixels more likely to be e.g. are skin pixels more likely to be found in certain regions of the image?found in certain regions of the image?
• With more than 1 class, priors With more than 1 class, priors encode what classes are more likelyencode what classes are more likely
Skin Detection ResultsSkin Detection Results
Jones & RehgJones & Rehg
BirchfieldBirchfield
Skin Color-Based Face Skin Color-Based Face TrackingTracking
Mixture ModelsMixture Models
• Although single-class models are useful, Although single-class models are useful, the real fun is in multiple-class modelsthe real fun is in multiple-class models
• p(observation) = p(observation) = classclass p pclassclass(observation)(observation)
• Interpretation:Interpretation: the object has some the object has some
probability probability classclass of belonging to each class of belonging to each class
• Probability of a measurement is a linear Probability of a measurement is a linear combination of models for different combination of models for different classesclasses
Gaussian Mixture ModelGaussian Mixture Model
• Simplest model for each probability Simplest model for each probability distribution: Gaussiandistribution: Gaussian
Symmetric:Symmetric:
Asymmetric:Asymmetric:
2
2
2
)(
)(
x
exp2
2
2
)(
)(
x
exp
2
)()( 1T
)(
xx
exp 2
)()( 1T
)(
xx
exp
Application: SegmentationApplication: Segmentation
• Consider the Consider the kk-means algorithm-means algorithm
• What if we did a “soft” (probabilistic) What if we did a “soft” (probabilistic) assignment of points to clustersassignment of points to clusters– Each point has a probability Each point has a probability jj of of
belonging to cluster belonging to cluster jj– Each cluster has a probability Each cluster has a probability
distribution,distribution,e.g. a Gaussian with mean e.g. a Gaussian with mean and and variance variance
““Probabilistic Probabilistic kk-means”-means”
• Changes to clustering algorithm:Changes to clustering algorithm:– Use Gaussian probabilities to assignUse Gaussian probabilities to assign
point point cluster weights cluster weights
''
, )(
)(
jj
jjp pG
pG
''
, )(
)(
jj
jjp pG
pG
““Probabilistic Probabilistic kk-means”-means”
• Changes to clustering algorithm:Changes to clustering algorithm:– Use Use p,jp,j to compute weighted average to compute weighted average
and covariance for each clusterand covariance for each cluster
jp
jpjj
j
jp
jpj
pp
p
,
,
T
,
,
))((
jp
jpjj
j
jp
jpj
pp
p
,
,
T
,
,
))((
Expectation MaximizationExpectation Maximization
• This is a special case of the expectation This is a special case of the expectation maximization algorithmmaximization algorithm
• General case: “missing data” frameworkGeneral case: “missing data” framework– Have known data (feature vectors) and Have known data (feature vectors) and
unknown data (assignment of points to unknown data (assignment of points to clusters)clusters)
– E step: use known data and current E step: use known data and current estimate of model to estimate unknownestimate of model to estimate unknown
– M step: use current estimate of complete M step: use current estimate of complete data to solve for optimal modeldata to solve for optimal model
EM ExampleEM Example
BreglerBregler
EM and RobustnessEM and Robustness
• One example of using generalized EM One example of using generalized EM framework: robustnessframework: robustness
• Make one category correspond to Make one category correspond to “outliers”“outliers”– Use noise model if knownUse noise model if known– If not, assume e.g. uniform noiseIf not, assume e.g. uniform noise– Do not update parameters in M stepDo not update parameters in M step
Example: Using EM to Fit to Example: Using EM to Fit to LinesLines
Good dataGood data
Example: Using EM to Fit to Example: Using EM to Fit to LinesLines
With outlierWith outlier
Example: Using EM to Fit to Example: Using EM to Fit to LinesLines
EM fitEM fit
Weights of “line”Weights of “line”(vs. “noise”)(vs. “noise”)
Example: Using EM to Fit to Example: Using EM to Fit to LinesLines
EM fit – bad local minimumEM fit – bad local minimum
Weights of “line”Weights of “line”(vs. “noise”)(vs. “noise”)
Example: Using EM to Fit to Example: Using EM to Fit to LinesLines
Fitting toFitting tomultiplemultiple
lineslines
Example: Using EM to Fit to Example: Using EM to Fit to LinesLines
Local minimaLocal minima
Eliminating Local MinimaEliminating Local Minima
• Re-run with multiple starting conditionsRe-run with multiple starting conditions
• Evaluate results based onEvaluate results based on– Number of points assigned to eachNumber of points assigned to each
(non-noise) group(non-noise) group– Variance of each groupVariance of each group– How many starting positions convergeHow many starting positions converge
to each local maximumto each local maximum
• With many starting positions, can With many starting positions, can accommodate many of outliersaccommodate many of outliers
Selecting Number of ClustersSelecting Number of Clusters
• Re-run with different numbers of Re-run with different numbers of clusters, look at total errorclusters, look at total error
• Will often see “knee” in the curveWill often see “knee” in the curve
Number of clustersNumber of clusters
Err
or
Err
or
OverfittingOverfitting
• Why not use many clusters, get low error?Why not use many clusters, get low error?
• Complex models bad at filtering noiseComplex models bad at filtering noise(with (with kk clusters can fit clusters can fit kk data points exactly) data points exactly)
• Complex models have less predictive powerComplex models have less predictive power
• Occam’s razor: Occam’s razor: entia non multiplicanda sunt entia non multiplicanda sunt praeter necessitatempraeter necessitatem (“Things should not be (“Things should not be multiplied beyond necessity”)multiplied beyond necessity”)
Training / Test DataTraining / Test Data
• One way to see if you have overfitting One way to see if you have overfitting problems:problems:– Divide your data into two setsDivide your data into two sets– Use the first set (“training set”) to trainUse the first set (“training set”) to train
your modelyour model– Compute the error of the model on theCompute the error of the model on the
second set of data (“test set”)second set of data (“test set”)– If error is not comparable to training If error is not comparable to training
error,error,have overfittinghave overfitting