[IEEE Second International Workshop on Semantic Media Adaptation and Personalization (SMAP 2007) - Uxbridge, Middlesex, UK (2007.12.17-2007.12.18)] Second International Workshop on

Image Description using Scale-Space Edge Pixel Directions Histogram

Antonio M. G. PinheiroUnidade de Deteccao RemotaUniversidade da Beira Interior

6200 Covilha - PortugalE-mail: [email protected]

Abstract

Edge Directions Histograms are widely used as an imagedescriptor for image retrieval and recognition applications.Edges represent textures and are also representative of theimage shapes. In this work a histogram of the edge pixeldirections is defined for image description. The edges de-tected with the Canny algorithm will be described in twodifferent scales in four directions. In the lower scale theimage is divided into 16 sub-images, and a descriptor with64 bins results. In the higher scale, as no image division isdone because only the most important image features will bepresent, 4 bins result. A total of 68 bins are used to describethe image in scale-space. Images will be compared usingthe Euclidean distance between histograms. The providedresults will be compared with the ones that result from theuse of the histogram in the low scale only. Improved classifi-cation using the Nearest Class Mean and Neural Networkswill be used. A higher level semantic annotation, based onthis low level descriptor that results from the multiscale im-age analysis, will be extracted.

Keywords: Image Description, Edge Description, His-togram, Image Classification, Scale-space.

1 Introduction

Multimedia access requires reliable description meth-ods that allow efficient recognition and characteri-zation of the information. Different models havebeen considered for image characterization and de-scription. An example, is the “Multimedia ContentDescription Interface” MPEG-7 standard. A set oflow-level image descriptors are defined in MPEG-7.They describe and measure images with relation totheir physical properties, like colour composition andstructure, textures, shapes, etc. Considering a humaninterface there is a need of a higher level approach that

allows a semantic annotation of the multimedia infor-mation.

A possible scheme to derive a semantic annotationcan be based on a general low-level description. Us-ing classification techniques some descriptors mightallow the extraction of the multimedia semantic anno-tation. Many classification techniques can be found,like clustering techniques, Bayesian decision, neuralnetworks, etc [4].

Image Data-base

Low-levelImage

Descriptors

Colour Description

Texture/Shape Description

Face Description

SEMANTIC DESCRIPTION

High-levelImage

Descriptors

Figure 1. Description Model

Figure 1 represents a possible scheme, where Colour,Texture/Shape and Faces are considered for low-leveldescription. Those sets of descriptors should havethe main information to define images. Edge descrip-tion seems to be close to image content description [8].Edges are related with two of the most important im-age features: textures and shapes. Edge descriptioncombined with Colour description provides very re-liable image description based on low level features

Second International Workshop on Semantic Media Adaptation and Personalization

0-7695-3040-0/07 $25.00 © 2007 IEEEDOI 10.1109/SMAP.2007.25

211

[7, 11]. The description model shown in figure 1, aColour structure histogram descriptor similar to theone defined in the MPEG-7 standard [9, 2] might be agood choice for colour description. A Face descriptorscheme based on the characteristic distances betweeneyes, nose and mouth, would allow an automatic highlevel description of pictures/videos with faces, relat-ing them with people’s identification. Those descrip-tors can be used together with an edge pixel directionshistogram resulting in a robust and reliable image de-scrition. However, solutions integrating several low-level descriptors require a multimodal analysis that isnot the aim of this work.

In this paper, a scale-space edge pixel directionshistogram representing textures and shapes is de-fined. Histograms have been widely use for im-age analysis, recognition and characterization appli-cations [6]. MPEG-7 methods also define an Edge His-togram Direction (EHD) descriptor [9, 2]. In this work,edges are extracted with the Canny method [3] in twodifferent scales. In the lower scale no edge thresh-olding is applied, and the resulting edges are mainlyrepresentative of textures. In the higher scale edgesare selected by hysteresis thresholding and the mainshapes of the images result [5].

This paper is organized as follows. The followingsection describes the scale-space edge pixel directionshistogram extraction. In section 3 the scale-space edgepixel directions histograms are classified using differ-ent methods. The ability of this descriptor to definedifferent semantic concepts is tested in this section.For that, 242 higher resolution images of the texturedata-base available on [1], are used. Concluding re-marks are given in section 4.

2 Scale-space Edge Pixel Directions His-togram

Edges are detected with the Canny algorithm [3] intwo different scales. Before edge extraction a lineardiffusion of the image (Gaussian filtering) is appliedin the Canny method. That results in a dependenceof a scale t (σ =

√2t) that is proportional to the in-

verse of the bandwidth of the Gaussian filter givenby Gσ = (1/(2πσ2))2 exp (−(x2 + y2)/2σ2). The localmaxima of the gradient of the filtered image will beselected as possible edge points. Those points are se-lected using a hysteresis thresholding, considering thegradient magnitude.

In the lower scale, no hysteresis thresholding isdone (high threshold and low threshold are equal tozero) and all the edge points are used to define the de-scriptor. For this reason, the resulting edges represent

the image textures.In the higher scale textures and noise tend to be re-

moved [13]. Additionally, edges are selected by hys-teresis thresholding resulting in a selection of the im-age edges of the main shapes.

The image descriptor will be derived from the edgedirections. The edge direction is perpendicular to thegradient direction, and it is straightforward to com-pute them. Those edges will be describe in four direc-tions (figure 2(a)).

(a) Edge directions

SubImg(0,0)

SubImg(0,1)

SubImg(0,2)

SubImg(0,3)

SubImg(1,0)

SubImg(1,1)

SubImg(1,2)

SubImg(1,3)

SubImg(2,0)

SubImg(2,1)

SubImg(2,2)

SubImg(2,3)

SubImg(3,0)

SubImg(3,1)

SubImg(3,2)

SubImg(3,3)

(b) Image division

Figure 2. Edge pixel directions histogramcomputation

The histogram of the edge pixel directions willcount the number of times the edge pixels have anyof the four directions. In the lower scale the imageis divided into 16 sub-images (4×4), and a descriptorwith 64 bins (16×4) results. With this image division,the local influence of the edge pixel directions is mea-sured (figure 2(b)). In the higher scale, no image di-vision is done because only the most important imagefeatures will be present. Only 4 bins result. A totalof 68 bins are used to describe the image based on thetwo scales’ edge pixel directions.

The scale t=4 was used (σ =√

2t is the Gaussianfilter parameter) as the lower scale and the scale t=8(double scale) for the higher scale. In this paper, allimage edges in the higher scale have been computedwith the hysteresis high thresholding parameter 80%and the low threshold parameter 20%. An exampleof the resulting edges can be seen in figure 3. Thelower scale edge image 3(b) is rich in textures, al-though the main edges have a strong definition. Thehigher scale edge image 3(c) is composed of the mainshapes’ edges. Almost all the texture edges have beenremoved.

3 Image Classification using Scale-spaceEdge Pixel Directions Histogram

The reliability of a descriptor needs to be confirmedafter its definition. In this work, a simple distance

212

(a) Original Image (b) Low scale edgeswith no thresholding

(c) High scale edgesafter hysteresisthresholding

Figure 3. Edges in the two scales used tocompute the Scale-space Edge Pixel Direc-tions Histogram

computation is used to compare and compute a sim-ilarity measure between images. After this simpleimage comparison, the defined scale-space descriptoris also going to be classified using the Nearest ClassMean method and Neural Networks. Those classifi-cation techniques are going to result in a higher levelsemantic annotation.

All the experiments presented in this paper use the242 high resolution images of the database of [1].

3.1 Image Comparison

A simple image comparison based on the distancebetween the two image scale space edge pixel direc-tions histograms given by:

d(Img1, Img2) =67∑

i=0

|h1[i] − h2[i]| (1)

results in a similarity measure between images, wherehi are the histograms representative of the two com-pared images. In general, the use of the scale-spacedescription results in an improvement of the simi-larity measure when compared with the descriptionwith the low scale histogram only. As an example,figure 4 shows the results of image retrieval for the242 database images when the image of figure 3 isthe query, with the scale-space edge pixel directionshistogram description. Figure 5 shows the same re-sult considering the histogram in the low scale only.Several experiments using the described method havebeen made. In general, the robustness of the similar-ity measure was improved by using the scale spacedescription and the number of negative matches is re-duced. The distance between histograms also reflectsfrom the human perception perspective a better mea-sure of the difference between images.

3.2 Classification using the Nearest ClassMean

Improved classification can be achieved by usingtwo or more training images to define a class. In theclassification using the Nearest Class Mean method[12], the mean of the training images histograms iscomputed. The proximity to that mean histogram es-tablishes if a given histogram belongs to an imageclass or not.

This classifying method was also used to test thereliability of the scale-space edge pixel directions his-togram. In general this method results in better clas-sification and retrieval results than using just one im-age histogram, as in the previous section. Instead ofcomputing the similarity to a query image, the sim-ilarity to the mean histogram of the training imageshistograms is computed. Using images 3(a) and 4(c)the false positive of figure 4(d) is suppressed. As anexample, figure 6, shows an example of image re-trieval using this classification method. Images 6(a),6(g) and 6(i) were used as the training images. The re-sults show eight positive matches out of twelve. Thisis a very good result considering that only edge pixeldirections histogram distance is used for the classifi-cation. The same set of experiments have been real-ized with histograms without the 4 bins extracted inthe higher scale (histograms with the lower scale 64bins). In that case, the experiment of figure 6 resultsin five matches out of twelve (6(g), 6(h) and 6(k) arenot retrieved in the first twelve). This is a typical situ-ation in the different experiments made. The numberof true positives that result from the retrieval is alwayslarger when the scale-space descriptor is used. In gen-eral using the scale-space 68 bins descriptor improvesthe classification and almost doubles the number oftrue positives, when compared with the use of the lowscale 64 bins histogram.

In practical applications this classification methodcan be useful, because after a first query, a user canselect one or two of the retrieved images to improve asecond query to the system.

3.3 Classification using a Neural Network

The previous method results in a higher seman-tic level of image classification, when compared withthe simple comparison with an image. However, themethod can not be defined as a high level feature an-notation method.

In this section a feed-forward back-propagationneural network is going to be used for high level fea-ture annotation. The neural network is trained to de-

213

(a) d(Qry, 1st Retr)

= 0.145

(b) d(Qry, 2nd Retr)

= 0.539

(c) d(Qry, 3rd Retr)

= 0.550

(d) d(Qry, 4th Retr)

= 0.555

(e) d(Qry, 5th Retr)

= 0.576 (f) d(Qry, 6th Retr)

= 0.597

Figure 4. Example of Image Retrieval using the defined Scale-Space Edge Pixel Directions HistogramDescriptor.

(a) d(Qry, 1st Retr)

= 0.156

(b) d(Qry, 2nd Retr)

= 0.577

(c) d(Qry, 3rd Retr)

= 0.598 (d) d(Qry, 4th Retr)

= 0.610

(e) d(Qry, 5th Retr)

= 0.611

(f) d(Qry, 6th Retr)

= 0.616

Figure 5. Example of Image Retrieval using the defined Edge Pixel Directions Histogram Descriptorwithout the higher scale information.

tect semantic concepts and features of the images. Thescale-space edge pixel directions histograms are go-ing to be used as the characteristic vectors represent-ing the images. Those results will be compared withthe ones that result from using the histogram with-out higher scale information. Previous work of neu-ral network classification using edge pixel directionshistogram in one scale only can be found in [10].

Several experiments have been conducted to testthe method. In general, a better classification resultsusing the scale-space description. In particular, thethree images used for computing the mean histogramof figure 6 were used as training set to detect urbanscenes. Five extra images representing negative clas-sification examples have also been used for training.The testing data-base has 20 images that might be con-sidered as having urban scenes. Considering the ur-ban scenes present in the images of figure 6 (6(a), 6(b),6(d), 6(e), 6(g), 6(h), 6(i) and 6(k)) extra twelve picturesrepresenting the remaining urban scenes are shownin figure 7. Table 1 shows the best results using thescale-space descriptor and the descriptor with the lowscale information only. It shows for the neural net-work training that provides the best results, the num-

ber of true positives and respective false positives. For19 and 18 true positive images, the false negative im-ages figure numbers are also shown. It was possibleto classify as urban scene 19 of the 20 images in thedatabase, in both cases . However, the number offalse positives was too high in both cases. In the ex-periments where 18 images were classified as urbanscenes, the number of false positives dropped to ac-ceptable values (37) when the scale-space descriptionwas used. However, if the low scale 64 bins descrip-tors (without the high scale information) were used,it was not possible to have acceptable values of falsepositives. This is an example representative of sev-eral experiments realized during the testing of the de-scribed technique. The scale-space information is veryvaluable for the neural network decision process, re-ducing the number of false positives (see the ReceiverOperating Curve (ROC) with the best results, in figure8 for the “urban” classification).

4 Final Remarks and Future Work

The direction of the edges detected with the Cannyalgorithm are used for image description. Edges are

214

(a)d(Mean, 1st Retr)

= 0.292

(b)d(Mean, 2nd Retr)

= 0.311

(c)d(Mean, 3rd Retr)

= 0.326

(d)d(Mean, 4th Retr)

= 0.354

(e)d(Mean, 5th Retr)

= 0.379

(f)d(Mean, 6th Retr)

= 0.381

(g)d(Mean, 7th Retr)

= 0.408

(h)d(Mean, 8th Retr)

= 0.415

(i) d(Mean, 9th Retr)

= 0.419

(j)d(Mean, 10th Retr)

= 0.423

(k)d(Mean, 11th Retr)

= 0.426

(l)d(Mean, 12th Retr)

= 0.433

Figure 6. Example of Image Classification with the Nearest Class Means using the defined Scale-space Edge Pixels Direction Histogram Descriptor.

(a)

(b)

(c) (d)

(e) (f)

(g) (h) (i) (j) (k) (l)

Figure 7. Remaining 12 images with urban scenes.

215

Scale-Space Low Scale Only

TruePositives

FalsePositives

WrongDecisions

FalsePositives

WrongDecisions

19 70 6(b) 75 7(b)18 37 7(d), 7(e) 75 7(a), 6(b)17 35 57

Table 1. Classification as “urban” images.

15

16

17

18

19

20

40 50 60 70 80

Scale-Space

Low Scale only

Cor

rect

Det

ectio

ns

False Positives

Figure 8. ROC of the urban scenes classifica-tion.

detected in two different scales, resulting in a scale-space descriptor, where texture and main shapes arerepresented.

This work shows that describing edges in the twochosen scales improves the recognition provided bythe descriptor, when compared with the descriptionextracted in the low scale only. The use of the extrafour bins of high scale information provides a gen-eral improvement of the classification, using the testedmethods.

The scale-space edge pixel directions histogramturns out as a reliable descriptor, that allows an im-proved image retrieval based on similarity. A highlevel semantic annotation can be achieved using clas-sification techniques. In this work, a neural networkwas used to achieve a high level semantic annotation.Several semantic concepts have been tested. It was de-cided to report the results related with the “urban”semantic concept only. Other semantic concepts likeflowers or plants, have been tested with reliable clas-sification results too.

As future work, new methods of classification willbe used to improve the semantic annotation level. Acolour descriptor will be added to the decision processand a multimodal classification will be researched. Itis expected that a higher level of annotation reliability

will be reached.

References

[1] MIT Media Lab, Vision and Modeling Group.http://vismod.media.mit.edu/pub/VisTex/.

[2] M. Bober. MPEG-7 visual shape descriptors. IEEETransactions on Circuits and Systems for Video Technology,11(6):716–719, June 2001.

[3] J. Canny. A computational approach to edge detection.IEEE Transactions on Pattern Analysis and Machine Intel-ligence, PAMI-8(6):679–698, Nov. 1986.

[4] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classi-fication. Wiley Interscience, 2nd edition, 2000.

[5] M. Ferreira, S. Kiranyaz, and M. Gabbouj. Multi-space edge detection and object extraction for imageretrieval. In ICASSP 2006, Toulouse, France, May 2006.

[6] E. Hadjidemetriou, M. D. Grossberg, and S. K. Nayar.Multiresolution histograms and their use for recogni-tion. IEEE Transactions on Pattern Analysis and MachineIntelligence, 26(7):831–847, July 2004.

[7] Q. Iqbal and J. K. Aggarwal. Image retrieval viaisotropic and anisotropic mappings. Pattern Recogni-tion Journal, 35(12):2673–2686, December 2002.

[8] D. Lowe. Distinctive image features from scale-invariant keypoints. Intl. Journal of Computer Vision,60(2):91–110, 2004.

[9] B. S. Manjunath, J.-R. Ohm, V. V. Vasudevan, andA. Yamada. Colour and texture descriptors. IEEETransactions on Circuits and Systems for Video Technology,11(6):703–715, June 2001.

[10] A. M. G. Pinheiro. Edge pixel histograms characteriza-tion with neural networks for an improved semanticdescription. In WIAMIS 2007, Santorini, Greece, June2007.

[11] Y. Rubner, C. Tomasi, and L. J. Guibas. The earthmover’s distance as a metric for image retrieval. Intl.Journal of Computer Vision, 40(2):99–121, 2000.

[12] L. G. Shapiro and G. C. Stockman. Computer Vision.Prentice Hall, Upper Saddle River, New Jersey, 2001.

[13] A. Witkin. Scale-space filtering. In Int. Joint Conf. Artifi-cial Intelligence, pages 16–22, Karlsruhe, West Germany,1983.

216

Documents

[IEEE Second International Workshop on Semantic Media Adaptation and Personalization (SMAP 2007) - Uxbridge, Middlesex, UK (2007.12.17-2007.12.18)] Second International Workshop on