Probability and Statistics in Vision. Probability Objects not all the sameObjects not all the same – Many possible shapes for people, cars, … – Skin has

Probability and Statistics Probability and Statistics in Visionin Vision

ProbabilityProbability

• Objects not all the sameObjects not all the same– Many possible shapes for people, cars, Many possible shapes for people, cars,

……– Skin has different colorsSkin has different colors

• Measurements not all the sameMeasurements not all the same– NoiseNoise

• But some are more probable than But some are more probable than othersothers– Green skin not likelyGreen skin not likely

Probability and StatisticsProbability and Statistics

• Approach: probability distribution of Approach: probability distribution of expected objects, expected observationsexpected objects, expected observations

• Perform mid- to high-level vision tasks Perform mid- to high-level vision tasks by finding most likely model consistent by finding most likely model consistent with actual observationswith actual observations

• Often don’t know probability Often don’t know probability distributions – learn them from statistics distributions – learn them from statistics of training dataof training data

Concrete Example – Skin ColorConcrete Example – Skin Color

• Suppose you want to find pixels with Suppose you want to find pixels with the color of skinthe color of skin

• Step 1: learn likely distribution of skin Step 1: learn likely distribution of skin colors from (possibly hand-labeled) colors from (possibly hand-labeled) training datatraining data

ColorColor

Pro

bab

ility

Pro

bab

ility

Conditional ProbabilityConditional Probability

• This is the probability of observing a This is the probability of observing a given color given color givengiven thatthat the pixel is the pixel is skinskin

• Conditional probability p(color|skin)Conditional probability p(color|skin)

Skin Color IdentificationSkin Color Identification

• Step 2: given a new image, want to Step 2: given a new image, want to find whether each pixel corresponds find whether each pixel corresponds to skinto skin

• Maximum likelihood estimation: pixel Maximum likelihood estimation: pixel is skin iff p(skin|color) > p(not skin|is skin iff p(skin|color) > p(not skin|color)color)

• But this requires knowing p(skin|But this requires knowing p(skin|color) and we only have p(color|skin)color) and we only have p(color|skin)

Bayes’s RuleBayes’s Rule

• ““Inverting” a conditional probability:Inverting” a conditional probability: p(B|A) = p(A|B) p(B|A) = p(A|B) p(B) / p(A) p(B) / p(A)

• Therefore,Therefore,p(skin|color) = p(color|skin) p(skin|color) = p(color|skin) p(skin) / p(skin) / p(color)p(color)

• p(skin) is the p(skin) is the priorprior – knowledge of the – knowledge of the domaindomain

• p(skin|color) is the p(skin|color) is the posteriorposterior – what we want – what we want

• p(color) is a p(color) is a normalizationnormalization term term

PriorsPriors

• p(skin) = priorp(skin) = prior– Estimate from training dataEstimate from training data– Tunes “sensitivity” of skin detectorTunes “sensitivity” of skin detector– Can incorporate even more Can incorporate even more

information:information:e.g. are skin pixels more likely to be e.g. are skin pixels more likely to be found in certain regions of the image?found in certain regions of the image?

• With more than 1 class, priors With more than 1 class, priors encode what classes are more likelyencode what classes are more likely

Skin Detection ResultsSkin Detection Results

Jones & RehgJones & Rehg

BirchfieldBirchfield

Skin Color-Based Face Skin Color-Based Face TrackingTracking

Mixture ModelsMixture Models

• Although single-class models are useful, Although single-class models are useful, the real fun is in multiple-class modelsthe real fun is in multiple-class models

• p(observation) = p(observation) = classclass p pclassclass(observation)(observation)

• Interpretation:Interpretation: the object has some the object has some

probability probability classclass of belonging to each class of belonging to each class

• Probability of a measurement is a linear Probability of a measurement is a linear combination of models for different combination of models for different classesclasses

Gaussian Mixture ModelGaussian Mixture Model

• Simplest model for each probability Simplest model for each probability distribution: Gaussiandistribution: Gaussian

Symmetric:Symmetric:

Asymmetric:Asymmetric:

2

2

2

)(

)(

x

exp2

2

2

)(

)(

x

exp

2

)()( 1T

)(

xx

exp 2

)()( 1T

)(

xx

exp

Application: SegmentationApplication: Segmentation

• Consider the Consider the kk-means algorithm-means algorithm

• What if we did a “soft” (probabilistic) What if we did a “soft” (probabilistic) assignment of points to clustersassignment of points to clusters– Each point has a probability Each point has a probability jj of of

belonging to cluster belonging to cluster jj– Each cluster has a probability Each cluster has a probability

distribution,distribution,e.g. a Gaussian with mean e.g. a Gaussian with mean and and variance variance

““Probabilistic Probabilistic kk-means”-means”

• Changes to clustering algorithm:Changes to clustering algorithm:– Use Gaussian probabilities to assignUse Gaussian probabilities to assign

point point cluster weights cluster weights

''

, )(

)(

jj

jjp pG

pG

''

, )(

)(

jj

jjp pG

pG

““Probabilistic Probabilistic kk-means”-means”

• Changes to clustering algorithm:Changes to clustering algorithm:– Use Use p,jp,j to compute weighted average to compute weighted average

and covariance for each clusterand covariance for each cluster

jp

jpjj

j

jp

jpj

pp

p

,

,

T

,

,

))((

jp

jpjj

j

jp

jpj

pp

p

,

,

T

,

,

))((

Expectation MaximizationExpectation Maximization

• This is a special case of the expectation This is a special case of the expectation maximization algorithmmaximization algorithm

• General case: “missing data” frameworkGeneral case: “missing data” framework– Have known data (feature vectors) and Have known data (feature vectors) and

unknown data (assignment of points to unknown data (assignment of points to clusters)clusters)

– E step: use known data and current E step: use known data and current estimate of model to estimate unknownestimate of model to estimate unknown

– M step: use current estimate of complete M step: use current estimate of complete data to solve for optimal modeldata to solve for optimal model

EM ExampleEM Example

BreglerBregler

EM and RobustnessEM and Robustness

• One example of using generalized EM One example of using generalized EM framework: robustnessframework: robustness

• Make one category correspond to Make one category correspond to “outliers”“outliers”– Use noise model if knownUse noise model if known– If not, assume e.g. uniform noiseIf not, assume e.g. uniform noise– Do not update parameters in M stepDo not update parameters in M step

Example: Using EM to Fit to Example: Using EM to Fit to LinesLines

Good dataGood data


With outlierWith outlier


EM fitEM fit

Weights of “line”Weights of “line”(vs. “noise”)(vs. “noise”)


EM fit – bad local minimumEM fit – bad local minimum

Weights of “line”Weights of “line”(vs. “noise”)(vs. “noise”)


Fitting toFitting tomultiplemultiple

lineslines


Local minimaLocal minima

Eliminating Local MinimaEliminating Local Minima

• Re-run with multiple starting conditionsRe-run with multiple starting conditions

• Evaluate results based onEvaluate results based on– Number of points assigned to eachNumber of points assigned to each

(non-noise) group(non-noise) group– Variance of each groupVariance of each group– How many starting positions convergeHow many starting positions converge

to each local maximumto each local maximum

• With many starting positions, can With many starting positions, can accommodate many of outliersaccommodate many of outliers

Selecting Number of ClustersSelecting Number of Clusters

• Re-run with different numbers of Re-run with different numbers of clusters, look at total errorclusters, look at total error

• Will often see “knee” in the curveWill often see “knee” in the curve

Number of clustersNumber of clusters

Err

or

Err

or

OverfittingOverfitting

• Why not use many clusters, get low error?Why not use many clusters, get low error?

• Complex models bad at filtering noiseComplex models bad at filtering noise(with (with kk clusters can fit clusters can fit kk data points exactly) data points exactly)

• Complex models have less predictive powerComplex models have less predictive power

• Occam’s razor: Occam’s razor: entia non multiplicanda sunt entia non multiplicanda sunt praeter necessitatempraeter necessitatem (“Things should not be (“Things should not be multiplied beyond necessity”)multiplied beyond necessity”)

Training / Test DataTraining / Test Data

• One way to see if you have overfitting One way to see if you have overfitting problems:problems:– Divide your data into two setsDivide your data into two sets– Use the first set (“training set”) to trainUse the first set (“training set”) to train

your modelyour model– Compute the error of the model on theCompute the error of the model on the

second set of data (“test set”)second set of data (“test set”)– If error is not comparable to training If error is not comparable to training

error,error,have overfittinghave overfitting

Documents

Probability and Statistics in Vision. Probability Objects not all the sameObjects not all the same – Many possible shapes for people, cars, … – Skin has