AdaBoost and RealBoost of RandomSparse Granule Classifiers

AdaBoost and RealBoost of Random Sparse Granule Classifiers

Table of Contents

Construction of Weak Classifiers:...................................................................................... 2Sparse Haar-Like Features............................................................................................................. 2Definition of Granule(size, x, y, parity):.....................................................................................2Feature Space Constraints.............................................................................................................. 2Feature Space Caveats......................................................................................................................3Training Weak Features:................................................................................................................ 3

AdaBoost Results..................................................................................................................... 4Feature Examples – Top 10 Sparse Haar-like Features selected by AdaBoost.............4Weak Classifier Error Curves........................................................................................................ 4Projected Population Histograms................................................................................................7AdaBoost ROC................................................................................................................................... 11

RealBoost Results:................................................................................................................ 11Projected Population Histograms.............................................................................................12RealBoost ROC.................................................................................................................................. 13RealBoost Caveats – Overfitting.................................................................................................14

Testing Data – Imposing Faces on Image......................................................................14AdaBoost Testing Results.............................................................................................................15RealBoost Testing Results............................................................................................................17

Conclusions:............................................................................................................................ 17

1

Construction of Weak Classifiers:

Sparse Haar-Like FeaturesIn this experiment, random sparse granules were used as weak classifiers. We will refer to the each weak classifier as a “Sparse Haar-like Feature”. Each feature in the feature space is composed of a set of granules:

Definition of Granule(size, x, y, parity):a. Each granule is a square region of the original image. b. The size of a granule is determined given a partition size p. The length

of a granule is set to 1/p pixels of the original square image. Thus, if p=4, and the original image is 16x16, the granule is 4x4.

c. Given the size of a granule, the image can be partitioned into a space of non-overlapping granules. i.e. a 16x16 image given a granule size of p=4 would be partitioned into 16 non-overlapping 4x4 granules. Each granule is given an (x,y) pixel coordinate which corresponds to its bottom left corner.

d. Each granule is assigned a pixel value, which is simply the average of all the pixels in its region.

e. Each granule has a parity, indicating if its pixel values should be summed as negative or positive values.

Feature Space ConstraintsEach Sparse Haar-like Feature can contain a set of any number of granules. This feature space is high-dimensional, rather than employ a heuristic search of features, I opted to simply constrain the feature space to by the following parameters:

1. Each feature would be composed of nGranules = [2,3,4].2. Each granule can only be of partition size [2,4,8].3. The parity’s combinations were unconstrained, an exhaustive search was

done on all non-symetric combinations of granule parities per granule set.4. An artificial parameter Beta=200,000 was introduced. If the feature space

of all features given nGranules=I is greater than Beta, then the largest partition parameter is removed from partition size space [2,4,8] successively until max features given nGranules=I < Beta. Thus, partition size p=8 was not included in any of the 3 or 4 granule features.

This produced a feature space of 40821 features total:

3486 features nGran=2 3420 features nGran=3 33915 features nGran=4

Feature Space Caveats In fact, none of the nGran=3 features were selected by AdaBoost. This is a result of having an odd number of granules and the sum of their parities summing

2

to non-zero. This caveat is described in C. Huang, H. Ai, Y. Li, and S. Lao, "High-performance Rotation Invariant Multi-View Face Detection", IEEE Trans. on PAMI, 29(4), 2007” as:

The first constraint makes the features “balanced” so that no zero-mean normalization is needed.This claim held in the results of this experiment.

Training Weak Features:Each feature was trained using 11838 positive face and 45456 negative non-face images. Each image is a 24x24 pixel square. To train the features, each image was first transformed into a smoothed granule space for each granule size in nGranules[2,4,8].

Smoothed Granule Space for a 16x16 image. Partition progresses through p= [8,4,2]

After obtaining the entire granule space for each image, a feature scores an image by summing all its granule values over the image, negative granule parities are equivalent to subtraction.

To determine a feature’s decision boundary, we simply find the midpoint of the scores of the positive and negative populations. We also set a parity for each feature depending on whether the positive population mean is less than the negative population mean or not.

AdaBoost ResultsAdaBoost was applied to our Sparse Haar-like Features, with a termination parameter of T=200 weak classifiers. The basic AdaBoost algorithm was used with slight modifications:

At each iteration, classifiers with weighted error >= 0.5 were discarded.

Our experiment terminated after selecting 123 weak classifiers. On the final iteration, no weak classifier had weighted error < 0.5.

3

Feature Examples – Top 10 Sparse Haar-like Features selected by AdaBoost

Each greyscale square represents a granule of the image.

Weak Classifier Error CurvesAt T=[0,10,50,100], the weak error

At T=0, the weak error goes up drastically for about 10,000 of the classifiers. This error comes from the 3-granule features that are not zero-mean normalized.

4

At T=10, the lowest weighted error achieved is already higher than 0.42, and about 15,000 classifiers have been discarded. You can see the poor classifiers on this iteration on the steep curve above err=0.50.

At t=50 we are down to about 600 classifiers, and the error range is small, about 0.48-0.50. The majority of our classifiers are still performing better than 0.50 though.

5

At T=100 the range of our error is extremely small, and our curve looks like it has become linear with only 35 classifiers. While the increase in error looked logarithmic before, it seems that now the classifiers are differentiated enough with respect to the image weights that dError/dClassifier has become closer to constant.

A composite of all the weak classifier errors over T. The domain and range are both shrinking.

6

Projected Population HistogramsNote: H(x) here refers to the threshold of AdaBoosts strong classifier, not the classification decision.

7

As T increases, the populations are becoming more separated.

8

As T increases, the curves shift right, indicating more images are being classified correctly.

9

AdaBoost ROC

RealBoost Results:RealBoost was implemented with 100 bins. This actually resulted in a sparse bin histogram for many classifiers, which had the following special cases:

When a bin contained 0 positive images, and 0 negative images, no information was gained, thus the realScore = 0.

When a bin contained 0 positive images, and >0 negative images, realScore = -FLOAT_MAX

When a bin contained 0 negative images, and >0 positive images, realScore = +FLOAT_MAX

In hindsight, we should have reduced the number of bins and smoothed the positive and negative probabilities over the bin histogram in order to smooth the overall weighted probability distributions of the negative and positive populations over real scores.

RealBoost was trained using the 123 weak classifiers selected by AdaBoost, training was terminated after boosting T=100 classifiers.

Projected Population Histograms

10

With T=10, there is virtually no within class scatter or between class scatter. With few classifiers, and each classifier containing many empty bins, most of our weak classifiers provide zero information, thus the scores are determined by maybe 1 or 2 classifiers for the entire population.

At T=50, there is class separation, but the histogram is sparse and does not look normal.

11

At T=100, there is more within-class scatter and between-class scatter. Although the overlapping region is small, both curves are highly skewed towards the other.

RealBoost ROC

At T=10, there is virtually no population separation, thus the ROC curve is x=0, a vertical line. We see that realBoost has a much higher ROC curve when evaluating training data.

12

RealBoost Caveats – OverfittingWe have seen that realBoost has a sparse population projection histogram, yet its ROC curve over the training data is impressive. However, histogram class separation is skewed highly towards the decision boundary, indicating low within-class scatter.

There are a number of issues to consider with our training algorithm:

1. Non-smoothing of empty bins: We arbitrarily set bin scores to positive or negative FLOAT_MAX if there are zero positive or negative training data within a bin. However there may be testing data which will fall within this bin. Thus, we are over-fitting the training data when we assign scores of high magnitude to empty bins. We should smooth the probabilities for each bin so that p(positive|bin) and p(negative|bin) are both never equal to zero.

2. Due to the sparsity of the projected populations, we should reduce the number of bins used, for a lower granularity and a smoother projected population distribution onto the strong classifier. This would reduce the need for population smoothing, and provide better generalization to test data.

Consequently, with possible training data over-fitting, we see a high ROC curve over training data. Although the false positive rate is low, we see the population projections are highly skewed towards each other, indicating that in testing it is likely we will see a much higher false positive rate.

Intuitively with highly granular, sparse population projections, many bins will have 0 positive or 0 negative probabilities, thus every empty bin gives us zero information or exaggerated scores for the non-zero class probability. So as we project test data into these empty bins, we have many weak classifiers that do not generalize well to the test data, making our strong classifier a weak discriminant function.

Testing Data – Imposing Faces on ImageTo test, we were given a large image class photo, with many different sized faces at different positions. The image was scaled such that the smallest face measured 16x16 and the largest face measured 56x56. Then the image was decomposed into windows of smallest size 16x16 incremented by 8 pixels per side up to the largest size 56x56. Multiples of 8 were used for each side length because our feature space contains granules of partition size p=[8,4,2]. Thus, every image window can be decomposed into granules easily.

Rather than do an exhaustive search of the window space at every possible window location, we offset each window’s bottom left corner by 5 pixels moving in the x and y direction.

13

AdaBoost Testing Results

AdaBoost classifications on 16x16 windows only


14


As seen above, AdaBoost hit rate and false positive rate are both high.

15

RealBoost Testing Results

RealBoost with all windows of size 16x16 to 56x56 by multiples of 8

Here it is clear that realBoost has a lower hit rate and false positive rate than adaboost in testing. Overfitting testing data, exaggerated scores on empty bins, and non-smoothed population densities contributed to this.

Additionally, it appears that larger windows have a higher false positive rate. This could be attributed to the additional information loss accrued when larger images are smoothed into granules.

Conclusions:The caveats of both boosting methods were mentioned above. We should address them. Additionally, we would like to test with a larger feature space and larger data set to boost better, more generalizable weak classifiers.

In python using parallel processing it would have taken 1.85 days to boost 500,000 weak classifiers over 60,000 images, so a more exhaustive experiment is very viable.

16

Documents

AdaBoost and RealBoost of RandomSparse Granule Classifiers