November 25, 2014Computer Vision Lecture 20: Object Recognition IV 1 Creating Data Representations...
If you can't read please download the document
November 25, 2014Computer Vision Lecture 20: Object Recognition IV 1 Creating Data Representations The problem with some data representations is that the
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 1 Creating Data Representations The problem with some data
representations is that the meaning of the output of one neuron
depends on the output of other neurons. This means that each neuron
does not represent (detect) a certain feature, but groups of
neurons do. In general, such functions are much more difficult to
learn. Such networks usually need more hidden neurons and longer
training, and their ability to generalize is weaker than for the
one-neuron-per-feature-value networks.
Slide 2
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 2 Creating Data Representations On the other hand, sets of
orthogonal vectors (such as 100, 010, 001) representing individual
features require more neurons and connections but can be processed
by the network more easily. This becomes clear when we consider
that a neurons net input signal is computed as the inner product of
the input and weight vectors. The geometric interpretation of these
vectors shows that orthogonal vectors are especially easy to
discriminate for a single neuron.
Slide 3
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 3 Creating Data Representations Another way of representing
n-ary data in a neural network is using one neuron per feature, but
scaling the (analog) value to indicate the degree to which a
feature is present. Good examples: the brightness of a pixel in an
input image the brightness of a pixel in an input image the
distance between a robot and an obstacle the distance between a
robot and an obstacle Poor examples: the letter (1 26) of a word
the letter (1 26) of a word the type (1 6) of a chess piece the
type (1 6) of a chess piece
Slide 4
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 4 Creating Data Representations This can be explained as
follows: The way NNs work (both biological and artificial ones) is
that each neuron represents the presence/absence of a particular
feature. Activations 0 and 1 indicate absence or presence of that
feature, respectively, and in analog networks, intermediate values
indicate the extent to which a feature is present. Consequently, a
small change in one input value leads to only a small change in the
networks activation pattern.
Slide 5
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 5 Creating Data Representations Therefore, it is appropriate to
represent a non-binary feature by a single analog input value only
if this value is scaled, i.e., it represents the degree to which a
feature is present. This is the case for the brightness of a pixel
or the output of a distance sensor (feature = obstacle proximity).
It is not the case for letters or chess pieces. For example,
assigning values to individual letters (a = 0, b = 0.04, c = 0.08,
, z = 1) implies that a and b are in some way more similar to each
other than are a and z. Obviously, in most contexts, this is not a
reasonable assumption.
Slide 6
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 6 K-Class Classification Problem Let us denote the k-th class by
C k, with n k exemplars or training samples, forming the sets T k
for k = 1, , K: The complete training set is T = T 1 T K. The
desired output of the network for an input of class k is 1 for
output unit k and 0 for all other output units: with a 1 at the
k-th position if the sample is in class k.
Slide 7
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 7 K-Class Classification Problem However, due to the sigmoid
output function, the net input to the output units would have to be
- or to generate outputs 0 or 1, respectively. Because of the
shallow slope of the sigmoid function at extreme net inputs, even
approaching these values would be very slow. To avoid this problem,
it is advisable to use desired outputs and (1 - ) instead of 0 and
1, respectively. Typical values for range between 0.01 and 0.1. For
= 0.1, desired output vectors would look like this:
Slide 8
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 8 K-Class Classification Problem We should not punish more
extreme values, though. To avoid punishment, we can define l p,j as
follows: 1.If d p,j = (1 - ) and o p,j d p,j, then l p,j = 0. 2.If
d p,j = and o p,j d p,j, then l p,j = 0. 3.Otherwise, l p,j = o p,j
- d p,j
Slide 9
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 9 Training and Performance Evaluation How many samples should be
used for training? Heuristic: At least 5-10 times as many samples
as there are weights in the network. Formula (Baum & Haussler,
1989): P is the number of samples, |W| is the number of weights to
be trained, and a is the desired accuracy (e.g., proportion of
correctly classified samples).
Slide 10
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 10 Training and Performance Evaluation What learning rate should
we choose? The problems that arise when is too small or to big are
similar to finding the minimum of a 1D function. Unfortunately, the
optimal value of entirely depends on the application. Values
between 0.1 and 0.9 are typical for most applications. Often, is
initially set to a large value and is decreased during the learning
process. Leads to better convergence of learning, also decreases
likelihood of getting stuck in local error minimum at early
learning stage.
Slide 11
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 11 Training and Performance Evaluation When training a BPN, what
is the acceptable error, i.e., when do we stop the training? The
minimum error that can be achieved does not only depend on the
network parameters, but also on the specific training set. Thus,
for some applications the minimum error will be higher than for
others.
Slide 12
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 12 Training and Performance Evaluation An insightful way of
performance evaluation is partial- set training. The idea is to
split the available data into two sets the training set and the
test set. The networks performance on the second set indicates how
well the network has actually learned the desired mapping. We
should expect the network to interpolate, but not extrapolate.
Therefore, this test also evaluates our choice of training
samples.
Slide 13
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 13 Training and Performance Evaluation If the test set only
contains one exemplar, this type of training is called hold-one-out
training. It is to be performed sequentially for every individual
exemplar. This, of course, is a very time-consuming process. For
example, if we have 1,000 exemplars and want to perform 100 epochs
of training, this procedure involves 1,000 999 100 = 99,900,000
training steps. Partial-set training with a 700-300 split would
only require 70,000 training steps. On the positive side, the
advantage of hold-one-out training is that all available exemplars
(except one) are use for training, which might lead to better
network performance.
Slide 14
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 14 Common Classification Tasks Recognition of individual
objects/facesRecognition of individual objects/faces Analyze
object-specific features (e.g., key points)Analyze object-specific
features (e.g., key points) Train with images from different
viewing anglesTrain with images from different viewing angles
Recognition of object classesRecognition of object classes Analyze
features that are consistent within class and differ between them
as much as possible.Analyze features that are consistent within
class and differ between them as much as possible. Train with many
exemplars from each class.Train with many exemplars from each
class. Recognition of scene typesRecognition of scene types Find
and analyze common features, objects, or layouts within scene
classes.Find and analyze common features, objects, or layouts
within scene classes. Use large variety of scene photos.Use large
variety of scene photos.
Slide 15
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 15 The Scene Classification Challenge!
Slide 16
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 16 Scene Classification Challenge 8 scene categories8 scene
categories 200 training samples, 60 test samples per category200
training samples, 60 test samples per category 256x256 pixels,
grayscale (.pgm)256x256 pixels, grayscale (.pgm) Use
backpropagation network as classifierUse backpropagation network as
classifier Framework will be providedFramework will be provided
Your task: Decide about input features and program feature
extractors.Your task: Decide about input features and program
feature extractors. Winner will be determined based on different
test images.Winner will be determined based on different test
images. Winner receives $50 gift certificate for Best Buy!Winner
receives $50 gift certificate for Best Buy!
Slide 17
November 25, 2014Computer Vision Lecture 20: Object Recognition
IV 17