Upload
buck-chase
View
221
Download
0
Embed Size (px)
DESCRIPTION
Motivating Problem Which pixels in this image are “skin pixels”? Useful for tracking, finding people, finding images with too much skin.
Citation preview
Lecture 5: Statistical Methods for Classification
CAP 5415: Computer VisionFall 2006
Classifiers: The Swiss Army Tool of Vision
A HUGE number of vision problems can be reduced to: Is this a _____ or not?
The next two lectures will focus on making that decision
Classifiers that we will cover Bayesian classification Logistic regression Boosting Support Vector Machines Nearest-Neighbor Classifiers
Motivating Problem Which pixels in this image are “skin pixels”?
Useful for tracking, finding people, finding images with too much skin.
How could you find skin pixels?
Step 1: Get Data
Label every pixel as skin or not skin
Getting Probabilities
Now that I have a bunch of examples, I can create probability distributions. P([r,g,b]|skin) = Probability of an [r,g,b] tuple given
that the pixel is skin P([r,g,b]|~skin) = Probability of an [r,g,b] tuple given
that the pixel is not skin
(From Jones and Rehg)
Using Bayes Rule
x – the observation y – some underlying cause (skin/not skin)
Using Bayes Rule
PriorLikelihood
Normalizing Constant
Classification
In this case P[skin|x] = 1-P[~skin|x] So the classifier reduces to
P[skin|x] > 0.5? We can change this to
P[skin|x] > c And vary c
The effect of varying c This is called a Receiver Operating Curve (or
ROC
From Jones and Rehg
Application: Finding Adult Pictures
Let's say you needed to build a web filter for a library
Could look at a few simple measurements based on the skin model
Example of Misclassified Image
Example of Correctly Classified Image
ROC Curve
Generative versus Discriminative Models
The classifier that I have just described is known as a generative model
Once you know all of the probabilities, you can generate new samples of the data
May be too much work You could also optimize a function to just
discriminate skin and not skin
Discriminative Classification using Logistic Regression
Imagine we had two measurements and we plotted each sample on a 2D chart
Discriminative Classification using Logistic Regression
Imagine we had two measurements and we plotted each sample on a 2D chart
To separate the two groups, we'll project each point onto a line
Some points will be projected to positive values and some will be projected to negative values
Discriminative Classification using Logistic Regression
This line defines a separating line Each point is classified based on where it falls
on the line
How do we get the line?
Common Option: Logistic Regression Logistic Function:
The logistic function Notice that g(x) goes from 0 to 1 We can use this to estimate the probability something being an
x or an o We need to find a function that will have large positive values
for x's And large negative values for o's
Fitting the Line
Remember, we want a line. For the diagram below, x = +1, o = -1 y = label of point (-1 or +1)
Fitting the line
The logistic function gives us an estimate of the probability of an example being either +1 or -1
We can fit the line by maximizing the conditional probability of the correct labeling of the training set
Also called features
Fitting the Line
We have multiple samples that we assume are independent, so the probability of the whole training set is
Fitting the line
It is usually easier to optimize the log conditional probability
Optimizing
Lots of options Easiest option: Gradient ascent
: The Learning Rate parameter, many ways to choose this
Choosing
My (current) personal favorite method Choose some value for Update w, Compute new probability If the new probability does not rise, divide
by 2 Otherwise multiply it by 1.1 (or something
similar) Called “Bold-Driver” heuristic
Faster Option
Computing the gradient requires summing over every training example
Could be slow for a large training set Speed-up: Stochastic Gradient Ascent Instead of computing the gradient over the
whole training set, instead choose one point at random.
Do update based on that one point
Limitations
Remember, we are only separating the two classes with a line
Separate this data with a line:
This is a fundamental problem, most things can't be separated by a line
Overcoming these limitations
Two options: Train on a more complicated function
Quadratic Cubic
Make a new set of features:
Advantages
We achieve non-linear classification by doing linear classification on non-linear transformations of the features
Only have to rewrite feature generation code Learning code stays the same
Nearest Neighbor Classifier
Is the “?” an x or an o?
?
Nearest Neighbor Classifier
Is the “?” an x or an o?
?
Nearest Neighbor Classifier
Is the “?” an x or an o?
?
Basic idea
For your new example, find the k nearest neighbors in the training set
Each neighbor casts a vote Label with the most votes wins Disadvantages:
Have to find the nearest neighbors Can be slow for a large training set Good approximate methods available (LSH - Indyk)