A Day at the Museum - Stanford University A Day at the Museum Catie Chang, Mary Etezadi-Amoli, and Michelle Hewlett I. Introduction Ourobjective was to identify 33 diﬀerent paint-ings

1

A Day at the Museum

Catie Chang, Mary Etezadi-Amoli, and Michelle Hewlett

I. Introduction

Our objective was to identify 33 different paint-ings from the Cantor Arts Center, based on adatabase of images depicting the paintings at var-ious angles, orientations, and illuminations.

We considered two main approaches for solv-ing this image processing problem. The firstused color histograms, which are advantageousbecause they do not change significantly acrossminor variations in scales and angles. Further-more, we found that warping or rotations werenot necessary for achieving high performance.One disadvantage of using this color histogrammethod is that lighting and noise can greatly af-fect the color content of the images. Our secondapproach was based on scale-invariant feature de-tection. However, the performance of the featuredetection was sensitive to the choice of severalparameters, and further optimization over theseparameters may have been necessary to achieveperfect performance. In addition, our implemen-tation of feature detection was more computa-tionally expensive than the color histogram ap-proach. After testing both algorithms, the colorhistogram method achieved higher accuracy andfaster runtime.

Our color histogram approach consists of fourmain steps (see Figure 1). First, the paintingis segmented from the background. Then, thecolors in this painting are quantized based on acustom color palette, which we designed in or-der to be representative of the colors across allof the training images in our set. Next, the im-age is partitioned into four quadrants, and nor-malized histograms of color values are computed.Finally, the color histograms of this image arecompared against those of the training images.We use a histogram intersection metric to quan-tify the similarity between histograms. The mean

of the histogram intersection values across all fourquadrants serves as our final measure of similar-ity.

To classify a novel image, we compute its sim-ilarity with every other image in the database.We identify the novel image as the painting cor-responding to the most similar training image.

We spent a day at the Cantor Arts Center andtook three more pictures of each painting. Ourtraining set was thus comprised of 198 images.

Fig. 1. Flowchart of algorithm.

II. Segmentation

Our first task was to segment the painting fromthe remainder of the image. Our algorithm con-verts the RGB image to grayscale (Figure 2),and then filters the grayscale image using me-dian filtering to reduce much of the noise presentin the original image. We then convert this fil-tered grayscale image to black and white using afixed threshold of 0.45, which worked well giventhe consistent lighting of the museum (Figure 3).We also implemented an adaptive thresholdingmethod, but it was more computationally expen-sive and, for our training set, did not offer signif-

2

icant improvement over the fixed thresholding.

Fig. 2. Grayscale image

Fig. 3. Black and white image, thresholded at 0.45.

After obtaining this black and white image andcreating a mask, we perform region labeling onthe white regions. Within all of the training im-ages, the painting falls in the center of the image;therefore, to ensure that we select regions cor-responding to the painting, we create a squarearound the center of the mask and choose all re-gions overlapping with this square. From these,we choose the region having the greatest numberof pixels (Figure 4).

We then create a new image that has ones at all

Fig. 4. Central region with the largest number ofpixels.

pixels corresponding to this label, and zeros else-where. This captures most of the painting, butcontains a few small holes. To fill in the holes, weinvert this image and perform region labeling onthe background. At this point, we are interestedin finding the region corresponding to the back-ground/wall. In order to do this, we find the la-bels of the four corners of the image, and removethe pixels having these labels. A final mask iscreated by zeroing out these background pixels.Finally, the mask is applied to the original image(Figure 5).

Our final segmentation algorithm performsvery well on the training images. The algorithmsuccessfully segments out the wall, other paint-ings, statues, and any other unnecessary objectsthat may appear in the images. Our algorithmalso ensures that the painting we extract fromthe original image lies in the center of the image,which was a valid constraint for our training andtest sets. While shadows around the frames aresometimes retained in the segmented image, thisdid not seem to affect the overall performance ofour classifier.

We tried several different approaches before fi-nalizing our segmentation algorithm. Our first

3

Fig. 5. Final image after segmentation.

approach was to convert the image from RGB toHSV coordinates and use the saturation value tothreshold the image. However, a problem withthis approach was that the saturation values forthe images were very noisy and could not beremedied even with filtering. Secondly, we triededge detection, but this technique did not per-form very well because several of the frames werequite fancy and did not have clear edges. A thirdapproach, which we refer to as the “wall thresh-olding” method (Figure 6), was to find all of theR, G, and B pixels whose values lie within a win-dow around the average R, G, and B values. Be-cause these are shades of gray, we assumed thatthese pixels are part of the wall, and thus classi-fied them as background pixels. On most of thetraining images, this method performed beauti-fully and even got rid of the shadows on some ofthe paintings. However, it failed badly on a fewimages, because both lighting and noise tendedto affect the R,G,B values.

III. Color Histogram Matching

Overview

We used a color histogram approach to solvethe painting identification problem. The imageis first quantized to 100 colors, and histograms

Fig. 6. Wall thresholding.

are computed over four quadrants to provide spa-tial localization. Histogram intersection is usedas the similarity metric. This method demon-strated the best performance of all the methodswe considered.

Methods

Before settling on our final color histogram al-gorithm, we tried many variations. We initiallytried comparing the 1-D histograms of the R,G, and B components, using mean square errorand correlation as performance metrics. This ap-proach did not work well, however, because his-togramming the R, G, and B components sepa-rately does not preserve information about thecolor as a whole. Since color is determined byan (R,G,B) triple, there are many different huesthat have the same R value, for example, so sepa-rate histograms of R, G, and B do not accuratelyrepresent the color palette of an image.

One way to get around this issue would be touse a multidimensional histogram that counts thenumber of pixels that have each possible (R,G,B)triplet value. The R, G, and B values would haveto be coarsely quantized so that the number ofbins is computationally reasonable. An alterna-tive option would be to quantize the number of

4

colors in the image based on a color palette, andtranslate each (R,G,B) triple into a single indexinto this palette. We chose this second option.

In building a color palette, it is important topick a good, representative set of colors. Ide-ally, such a palette would consist of the minimumnumber of colors needed to accurately representa set of images. Our approach was to use thergb2ind function to quantize each of the imagesin the training set to 256 colors using minimumvariance quantization. We stored the combinedcolors from each image into one color map, andthen rounded and sampled to remove duplicates.The resulting set of 100 colors is shown in Figure7.

Fig. 7. Color palette.

Another important factor to consider whenusing histograms for image comparison is spa-tial localization. A color histogram gives no in-formation about the spatial distribution of col-ors in the image, so a method that relies oncolor histograms for identification would likelybenefit from incorporating spatial information.Gavrielides et al. and Cinque et al. [1,2] describemethods that use spatial chromatic histograms.For each color i in the quantized image, a vec-tor (hi, bi, σi) is defined, where hi is the num-ber of pixels in the image with color i, bi is thecoordinates of the centroid of color i, and σi isa measure of the spread about this centroid of

the pixels having color i. The metric used forcomparing a query image with a training set im-age is histogram intersection, scaled by centroidand spread information. Since a vector has to becomputed for each color in the quantized image,this method can be computationally slow. Also,since we are dealing with histograms over the seg-mented regions of the paintings, and since theseregions are not all the same size, the centroidand spread measurements would have to be com-puted relative to local coordinates of the maskand scaled appropriately so they can be mean-ingfully compared.

Since the goal of spatial chromatic histogramsis simply to encode some spatial informationabout the colors, we decided to use a simplermethod of dividing the mask area into four quad-rants (Figure 8), and computing histograms overthose quadrants. This method works better thancomputing histograms over the entire mask (nospatial localization), and it performed compara-bly to division into nonants (9 sections).

Fig. 8. Illustration of quandrants for image in Figure5.

We used histogram intersection [3] as our sim-ilarity metric. Given a test image histogram Tand a model training image histogram M , theintersection of these histograms is defined as:

5

m∑

j=1

min(Tj ,Mj)

where j is an index over colors in the palette. Theintersection gives the number of pixels in commonbetween the model and test histograms. To formour final similarity metric, we normalized this in-tersection by the number of pixels in the modelimage. This value lies between 0 and 1, where 1indicates a perfect match:

H(T,M) =

∑mj=1

min(Tj ,Mj)∑mj=1

Mj

Since the segmented paintings have differentsizes, we scaled the test image histogram to havethe same number of pixels as the model imagehistogram. This metric weights all color binsequally, without any consideration of how percep-tually similar the colors are. An alternative met-ric, histogram quadratic distance [4], takes thisweighting into consideration, but did not performas well as histogram intersection. The formula isas follows:

dmin(x, y) = (x − y)T A(x − y)

where aij = (1 − dij/dmax), dij is the Euclideandistance between colors i and j in the colormap,and dmax is the maximum distance between anytwo colors in the colormap.

Histogram intersection also performed betterthan mean-squared error, correlation, infinitynorm, and a combination of histogram intersec-tion and mean square error.

Results and Discussion

Using histogram intersection with four quad-rants and our 100-color palette, we obtained98.5% accuracy. Table 1 shows the number oferrors obtained with our final method, as well asthe number of errors obtained with three othersimilarity metrics. Since we have six pictures ofeach of the 33 paintings, we divided the entiretraining set into six sets (each containing all 33paintings). To evaluate our algorithm, we trained

on each possible combination of five image setsand tested the remaining image set.

Even though the museum lighting is controlled,variations in lighting still exist across the paintingarea, and the color appears to vary somewhatdepending on the distance and the angle at whichthe picture is taken. We tried a form of gray-world color balancing to see if this would help,but the results were actually worse.

IV. Feature-based Recognition

Overview

We also implemented a feature-based approachto recognition, based on the the concept of SIFTFeatures [5,6]. We identified scale-invariant fea-tures by finding local extrema in Difference-of-Gaussians (DoG) image pyramids, and built alocal descriptor for each feature based on orien-tation histograms. We then experimented withseveral metrics for evaluating the similarity be-tween features of two images.

Methods Description

Successively higher levels in the Gaussian im-age pyramid were obtained by smoothing theGaussian image (σ = 1.5) and then downsam-pling (bilinear interpolation) by a factor of 0.8.At each level, two Gaussian images (differing bya smoothing factor of

√

2) were produced andsubtracted to obtain the DoG image. Extrema(peaks) were detected by identifying pixels thatwere greater than or less than all of their 8 neigh-bors. To reduce the number of noisy features,we only retained peaks whose magnitudes ex-ceeded 60% of the maximum magnitude in theDoG image. In addition, peaks that were likely toform parts of edges were eliminated by excludingthose pixels whose ratio of principal curvaturesexceeded a threshold value.

For each feature surviving the above criteria,we constructed a local descriptor by building ori-entation histograms, as described in [6]. Within a16x16 window around the feature, gradient mag-nitudes were weighted by a Gaussian window(σ = 7) and accumulated into 8-bin orientation

6

TABLE I

Error comparison across similarity metrics for Color Histograms

Image Set Histogram Intersection MSE Correlation Histogram Quadratic Distance

1 1 0 1 1

2 1 1 1 2

3 0 1 0 1

4 0 0 1 2

5 1 2 2 1

6 0 0 0 3

Total Errors (out of 198) 3 4 5 10

Percent Correct 98.5 98.0 97.4 95.0

histograms. To save computation time, the gradi-ent magnitdue and orientation images at all lev-els were pre-computed. Just as in [6], we madea different histogram for each of four 4x4 pixelregions within the 16x16 window. The values ofeach histogram bin were concatenated into a 128-entry vector, which was then normalized to unitlength in order to increase the robustness againstchanges in illumination.

To identify a new test image, we computed apairwise similarity metric between the feature setbelonging to the test image and the features setsof each of the images in the training set. Thetraining set image with the highest similarity de-termined the class (painting) of the test image.Since we were interested in recognizing entire im-ages, rather than specific objects within an im-age, we only needed to be concerned with theoverall similarity between features sets of two im-ages rather than explicitly finding the correspon-dences.

The similarity between two features f1 and f2was defined as

√

2− d(f1, f2), where d(f1, f2) isthe Euclidean distance. The nearest neighbor ofa feature in Image 1 is defined as the feature inImage 2 having the highest similarity.

We tried several ways of defining the similaritybetween sets of features from two different im-ages, including:(1) For each feature in Image 1, compute the sim-ilarity with its nearest neighbor in Image 2. Takethe sum of these similarities.

(2) Same as (1), but sum over only the N -percenthighest similarity values. (N between 25 and 50usually gave the best results.)(3) For each feature in Image 1, find the ratio be-tween the similarities of its nearest neighbor andthe second-nearest neighbor. Then, take the sumof these ratios.

Results and Discussion

At best, this approach misclassified 10 of the99 images in the original training set – worse thanthe best results using color histograms. In addi-tion, the computation time was about 10 secondsslower than the color histogram method. There-fore, we chose not to use the feature-detectionapproach in our final submission.

There are several reasons why this feature-based approach did not achieve perfect accuracy.One reason concerns the choice of optimal param-eters for the algorithm. We tried a range of valuesfor parameters such as (1) the distance betweenscales on the image pyramid (0.8 usually gave thebest results) and (2) the threshold for the magni-tude of peaks retained in the peak-detection step(60% ended up being a good value). However,we did not do a very fine-grained search for opti-mal parameters, and this may have affected theaccuracy significantly. Indeed, we found that dif-ferent sets of parameters had a large impact onthe accuracy.

For the similarity metrics, options (1) and (2)performed significantly better than (3).

7

The lack of rotation-invariance in our imple-mentation may have also made a difference. Fur-thermore, for simplicity we also chose not to im-plement several other optimizations described inLowe’s paper, such as fitting quadratic functionsto localize the features in both image- and scale-space with greater accuracy.

V. Conclusion

We successfully ID’d 98.5% images in our train-ing set using a color-histogram approach. Here isa beautiful reflection of our success, found at theCantor Arts Center:

Fig. 9. idididididid.

VI. References

[1] Gavrielides, M.A., Sikudova, E., Pitas,I. “Color-based descriptors for image finger-printing.” IEEE Transactions on Multimedia.8(4):740-748, 2006.[2] Cinque, L., G. Ciocca, S. Levialdi, A. Pelli-cano, R. Schettini. “Color-based image retrievalusing spatial-chromatic histograms.” Image andVision Computing. 19:979-986, 2001.[3] Swain, M.J. and D.H. Ballard. “Color index-ing.” International Journal of Computer Vision.7(1):11-32 , 1991.[4] Sawhney, H.S. and J.L. Hafner. “Efficientcolor histogram indexing.” Image Processing,1994. Proceedings, ICIP-94. IEEE NationalConference, Vol. 2, Iss 13-16, Nov 1994, pp. 66-70.

[5] David G. Lowe, “Object recognition from lo-cal scale-invariant features,” International Con-ference on Computer Vision, Corfu, Greece(September 1999), pp. 1150-1157.[6] David G. Lowe, “Distinctive image featuresfrom scale-invariant keypoints,” InternationalJournal of Computer Vision, 60, 2 (2004), pp.91-110.

VII. Group Participation

Catie focused on coding the feature detectionalgorithm, and Mary & Michelle jointly concen-trated on programming the color histogram ap-proach. Everything else was equally divided.

Documents

A Day at the Museum - Stanford University A Day at the Museum Catie Chang, Mary Etezadi-Amoli, and Michelle Hewlett I. Introduction Ourobjective was to identify 33 diﬀerent paint-ings