November 25, 2014Computer Vision Lecture 20: Object Recognition IV 1 Creating Data Representations The problem with some data representations is that the

Embed Size (px)

Citation preview

  • Slide 1
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 1 Creating Data Representations The problem with some data representations is that the meaning of the output of one neuron depends on the output of other neurons. This means that each neuron does not represent (detect) a certain feature, but groups of neurons do. In general, such functions are much more difficult to learn. Such networks usually need more hidden neurons and longer training, and their ability to generalize is weaker than for the one-neuron-per-feature-value networks.
  • Slide 2
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 2 Creating Data Representations On the other hand, sets of orthogonal vectors (such as 100, 010, 001) representing individual features require more neurons and connections but can be processed by the network more easily. This becomes clear when we consider that a neurons net input signal is computed as the inner product of the input and weight vectors. The geometric interpretation of these vectors shows that orthogonal vectors are especially easy to discriminate for a single neuron.
  • Slide 3
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 3 Creating Data Representations Another way of representing n-ary data in a neural network is using one neuron per feature, but scaling the (analog) value to indicate the degree to which a feature is present. Good examples: the brightness of a pixel in an input image the brightness of a pixel in an input image the distance between a robot and an obstacle the distance between a robot and an obstacle Poor examples: the letter (1 26) of a word the letter (1 26) of a word the type (1 6) of a chess piece the type (1 6) of a chess piece
  • Slide 4
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 4 Creating Data Representations This can be explained as follows: The way NNs work (both biological and artificial ones) is that each neuron represents the presence/absence of a particular feature. Activations 0 and 1 indicate absence or presence of that feature, respectively, and in analog networks, intermediate values indicate the extent to which a feature is present. Consequently, a small change in one input value leads to only a small change in the networks activation pattern.
  • Slide 5
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 5 Creating Data Representations Therefore, it is appropriate to represent a non-binary feature by a single analog input value only if this value is scaled, i.e., it represents the degree to which a feature is present. This is the case for the brightness of a pixel or the output of a distance sensor (feature = obstacle proximity). It is not the case for letters or chess pieces. For example, assigning values to individual letters (a = 0, b = 0.04, c = 0.08, , z = 1) implies that a and b are in some way more similar to each other than are a and z. Obviously, in most contexts, this is not a reasonable assumption.
  • Slide 6
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 6 K-Class Classification Problem Let us denote the k-th class by C k, with n k exemplars or training samples, forming the sets T k for k = 1, , K: The complete training set is T = T 1 T K. The desired output of the network for an input of class k is 1 for output unit k and 0 for all other output units: with a 1 at the k-th position if the sample is in class k.
  • Slide 7
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 7 K-Class Classification Problem However, due to the sigmoid output function, the net input to the output units would have to be - or to generate outputs 0 or 1, respectively. Because of the shallow slope of the sigmoid function at extreme net inputs, even approaching these values would be very slow. To avoid this problem, it is advisable to use desired outputs and (1 - ) instead of 0 and 1, respectively. Typical values for range between 0.01 and 0.1. For = 0.1, desired output vectors would look like this:
  • Slide 8
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 8 K-Class Classification Problem We should not punish more extreme values, though. To avoid punishment, we can define l p,j as follows: 1.If d p,j = (1 - ) and o p,j d p,j, then l p,j = 0. 2.If d p,j = and o p,j d p,j, then l p,j = 0. 3.Otherwise, l p,j = o p,j - d p,j
  • Slide 9
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 9 Training and Performance Evaluation How many samples should be used for training? Heuristic: At least 5-10 times as many samples as there are weights in the network. Formula (Baum & Haussler, 1989): P is the number of samples, |W| is the number of weights to be trained, and a is the desired accuracy (e.g., proportion of correctly classified samples).
  • Slide 10
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 10 Training and Performance Evaluation What learning rate should we choose? The problems that arise when is too small or to big are similar to finding the minimum of a 1D function. Unfortunately, the optimal value of entirely depends on the application. Values between 0.1 and 0.9 are typical for most applications. Often, is initially set to a large value and is decreased during the learning process. Leads to better convergence of learning, also decreases likelihood of getting stuck in local error minimum at early learning stage.
  • Slide 11
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 11 Training and Performance Evaluation When training a BPN, what is the acceptable error, i.e., when do we stop the training? The minimum error that can be achieved does not only depend on the network parameters, but also on the specific training set. Thus, for some applications the minimum error will be higher than for others.
  • Slide 12
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 12 Training and Performance Evaluation An insightful way of performance evaluation is partial- set training. The idea is to split the available data into two sets the training set and the test set. The networks performance on the second set indicates how well the network has actually learned the desired mapping. We should expect the network to interpolate, but not extrapolate. Therefore, this test also evaluates our choice of training samples.
  • Slide 13
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 13 Training and Performance Evaluation If the test set only contains one exemplar, this type of training is called hold-one-out training. It is to be performed sequentially for every individual exemplar. This, of course, is a very time-consuming process. For example, if we have 1,000 exemplars and want to perform 100 epochs of training, this procedure involves 1,000 999 100 = 99,900,000 training steps. Partial-set training with a 700-300 split would only require 70,000 training steps. On the positive side, the advantage of hold-one-out training is that all available exemplars (except one) are use for training, which might lead to better network performance.
  • Slide 14
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 14 Common Classification Tasks Recognition of individual objects/facesRecognition of individual objects/faces Analyze object-specific features (e.g., key points)Analyze object-specific features (e.g., key points) Train with images from different viewing anglesTrain with images from different viewing angles Recognition of object classesRecognition of object classes Analyze features that are consistent within class and differ between them as much as possible.Analyze features that are consistent within class and differ between them as much as possible. Train with many exemplars from each class.Train with many exemplars from each class. Recognition of scene typesRecognition of scene types Find and analyze common features, objects, or layouts within scene classes.Find and analyze common features, objects, or layouts within scene classes. Use large variety of scene photos.Use large variety of scene photos.
  • Slide 15
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 15 The Scene Classification Challenge!
  • Slide 16
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 16 Scene Classification Challenge 8 scene categories8 scene categories 200 training samples, 60 test samples per category200 training samples, 60 test samples per category 256x256 pixels, grayscale (.pgm)256x256 pixels, grayscale (.pgm) Use backpropagation network as classifierUse backpropagation network as classifier Framework will be providedFramework will be provided Your task: Decide about input features and program feature extractors.Your task: Decide about input features and program feature extractors. Winner will be determined based on different test images.Winner will be determined based on different test images. Winner receives $50 gift certificate for Best Buy!Winner receives $50 gift certificate for Best Buy!
  • Slide 17
  • November 25, 2014Computer Vision Lecture 20: Object Recognition IV 17