11
. RESEARCH PAPERS . SCIENCE CHINA Information Sciences June 2011 Vol. 54 No. 6: 1207–1217 doi: 10.1007/s11432-011-4263-2 c Science China Press and Springer-Verlag Berlin Heidelberg 2011 info.scichina.com www.springerlink.com Efficient random saliency map detection HUANG ZhiYong, HE FaZhi , CAI XianTao, ZOU ZhengQin, LIU Jing, LIANG MingMing & CHEN Xiao Computer School, Wuhan University, Wuhan 430074, China Received August 22, 2010; accepted June 6, 2011 Abstract Most image retargeting algorithms rely heavily on valid saliency map detection to proceed. However, the inefficiency of high quality saliency map detection severely restricts the application of these image retargeting methods. In this paper, we propose a random algorithm for efficient context-aware saliency map detection. Our method is a multiple level saliency map detection algorithm that integrates multiple level coarse saliency maps into the resulting saliency map and selectively updates unreliable regions of the saliency map to refine detection results. Because of the randomized search, our method requires very little additional memory beyond that for the input image and result map, and does not need to build auxiliary data structures to accelerate the saliency map detection. We have implemented our algorithm on a GPU and demonstrated the performance for a variety of images and video sequences, compared with state-of-the-art image processing. Keywords saliency map detection, image retargeting, random algorithm Citation Huang Z Y, He F Z, Cai X T, et al. Efficient random saliency map detection. Sci China Inf Sci, 2011, 54: 1207–1217, doi: 10.1007/s11432-011-4263-2 1 Introduction In recent years, a large number of content-aware image and video retargeting methods [1–4] have been proposed. The goal of these methods is to change the aspect ratio and the resolution of the image or video to fit various target device displays, while preserving as much important content as possible and eliminating visible artifacts. One of the most challenging issues for these methods is how to detect high quality saliency maps quickly from input images or videos. Usually, pixel-wise saliency detection is computationally costly and requires a large amount of memory. Thus, certain saliency map detection methods [5, 6] only detect a coarse level saliency map. This causes a marked decrease in the quality of result images reproduced by image editing methods. Figure 1 confirms that retargeting methods produce better results from finer saliency maps. Inspired by the works of Barens [7] and Goferman [6], we propose a novel algorithm for rapidly detecting context-aware saliency maps of input images or videos. First, we use a random search algorithm to detect a coarse saliency map at each layer of the Gaussian image pyramid. Second, we refine each coarse saliency map to remove noise introduced by the random algorithm. Then, we merge the refined coarse saliency maps at different layers of the image pyramid. Finally, we adaptively update those pixels with unreliable Corresponding author (email: [email protected])

Efficient random saliency map detection

Embed Size (px)

Citation preview

Page 1: Efficient random saliency map detection

. RESEARCH PAPERS .

SCIENCE CHINAInformation Sciences

June 2011 Vol. 54 No. 6: 1207–1217

doi: 10.1007/s11432-011-4263-2

c© Science China Press and Springer-Verlag Berlin Heidelberg 2011 info.scichina.com www.springerlink.com

Efficient random saliency map detection

HUANG ZhiYong, HE FaZhi∗, CAI XianTao, ZOU ZhengQin, LIU Jing,

LIANG MingMing & CHEN Xiao

Computer School, Wuhan University, Wuhan 430074, China

Received August 22, 2010; accepted June 6, 2011

Abstract Most image retargeting algorithms rely heavily on valid saliency map detection to proceed. However,

the inefficiency of high quality saliency map detection severely restricts the application of these image retargeting

methods. In this paper, we propose a random algorithm for efficient context-aware saliency map detection. Our

method is a multiple level saliency map detection algorithm that integrates multiple level coarse saliency maps

into the resulting saliency map and selectively updates unreliable regions of the saliency map to refine detection

results. Because of the randomized search, our method requires very little additional memory beyond that for

the input image and result map, and does not need to build auxiliary data structures to accelerate the saliency

map detection. We have implemented our algorithm on a GPU and demonstrated the performance for a variety

of images and video sequences, compared with state-of-the-art image processing.

Keywords saliency map detection, image retargeting, random algorithm

Citation Huang Z Y, He F Z, Cai X T, et al. Efficient random saliency map detection. Sci China Inf Sci, 2011,

54: 1207–1217, doi: 10.1007/s11432-011-4263-2

1 Introduction

In recent years, a large number of content-aware image and video retargeting methods [1–4] have beenproposed. The goal of these methods is to change the aspect ratio and the resolution of the imageor video to fit various target device displays, while preserving as much important content as possibleand eliminating visible artifacts. One of the most challenging issues for these methods is how to detecthigh quality saliency maps quickly from input images or videos. Usually, pixel-wise saliency detectionis computationally costly and requires a large amount of memory. Thus, certain saliency map detectionmethods [5, 6] only detect a coarse level saliency map. This causes a marked decrease in the quality ofresult images reproduced by image editing methods. Figure 1 confirms that retargeting methods producebetter results from finer saliency maps.

Inspired by the works of Barens [7] and Goferman [6], we propose a novel algorithm for rapidly detectingcontext-aware saliency maps of input images or videos. First, we use a random search algorithm to detecta coarse saliency map at each layer of the Gaussian image pyramid. Second, we refine each coarse saliencymap to remove noise introduced by the random algorithm. Then, we merge the refined coarse saliencymaps at different layers of the image pyramid. Finally, we adaptively update those pixels with unreliable∗Corresponding author (email: [email protected])

Page 2: Efficient random saliency map detection

1208 Huang Z Y, et al. Sci China Inf Sci June 2011 Vol. 54 No. 6

(a) (b) (c) (d) (e) (f)

Figure 1 Comparisons of saliency map results and retargeting results. (a) Input image with dimensions 476 × 704; (b)

saliency map produced using the method of Itti et al.; (c) saliency map generated from the algorithm by Goferman et

al., as produced by Stas Goferman.; (d) saliency map detected using our random algorithm; (e) upper image is the result

produced using the saliency map in (b), lower image is the result generated from the saliency map in (c); (f) upper image

is the uniform scale result of the input image, and the lower image is the result produced using the saliency map in (d).

The retargeting results of the input image were produced using the scale-and-stretch method except for the uniform scale

result.

(a) (b) (c) (d) (e)

Figure 2 Comparison of saliency map results. (a) Input image with dimensions 742 × 495; (b) saliency map produced

using the method of Itti et al.; (c) saliency map generated from the algorithm by Goferman et al., as produced by Goferman;

(d) coarse saliency map detected with our random algorithm in a single layer; (e) refined saliency map obtained with our

random algorithm using multiple layers of the Gaussian image pyramid. (b) and (c) were generated with a coarse scale and

then uniformly scaled to size 742 × 495. (d) was generated with a single fine scale. (e) is the result of applying our random

efficient saliency map detection method to the input image.

saliency. Our saliency map detection method is a random algorithm that does not need auxiliary datastructures to accelerate the saliency detection and only requires very little additional memory beyondthat for the input image and result map. Furthermore, it produces a fine level saliency map of inputimages or videos. This improves the results of image editing which relies heavily on the saliency map ofthe input image. Our method can easily be implemented and concurrently executed on a GPU, allowingit to produce saliency maps of image sequences practically in real-time.

2 Related works

Recently, with the ever increasing need to alter the dimensions and aspect ratios of images and videosfor different mobile devices, the subject of content-aware retargeting has regained increased academicattention, and a number of contributions have been published. However, a finer saliency map is afundamental prerequisite of these image and video retargeting methods, Figure 1 shows that a finersaliency map can produce better results. High quality saliency map detection, however, is computationallycostly and consumes a large amount of memory. The random saliency map detection algorithm presentedin this paper can rapidly produce a high quality finer context-aware saliency map with lower memoryrequirements. Next, we review image retargeting and saliency map detection.

Page 3: Efficient random saliency map detection

Huang Z Y, et al. Sci China Inf Sci June 2011 Vol. 54 No. 6 1209

2.1 Image and video retargeting

Many content-aware image and video retargeting techniques [2–4, 7–12] have been proposed to adaptimages and videos to target displays with different resolutions and aspect ratios. They often share thesame common structure: first, define a saliency map of the image, followed by content-aware operationsthat retain as much important visual information as possible and eliminate visible artifacts. Thesemethods differ mainly with respect to the specific content-aware operations. For example, croppingmethods [11, 13–15] search for a single window covering important content and retain it while completelydiscarding the rest.

Unlike homogeneous resizing, which is simply a linear mapping, recent works in non-photorealisticretargeting, seam carving, one-directional image warping, and omnidirectional image warping allow non-linear retargeting of images. These methods strive to redistribute the pixels of the entire image accordingto their importance values. The method in [11] allows local content scaling in both the horizontal andvertical dimensions even if the user changes only the width or height of the image.

Redistribution of pixels under patch-based coherence and completeness constraints was studied in [16].These methods afford more flexibility for image editing operations, including image resizing, despite amuch higher computational cost. The concepts from image retargeting have also been transferred tocontent-aware shape resizing and focus+context visualization of 3D models [17].

Almost all the image retargeting methods can be adapted to resize videos by addressing two prob-lems: augmenting image importance models with motion information and resizing individual frames ina temporally coherent manner. The cropping-based retargeting methods achieve temporally coherentresults by searching for a smooth cropping sequence. Retargeting methods involving local redistributionof pixels, demand pixel-level temporal coherence, which is apparently more difficult.

2.2 Saliency map detection

A profound challenge in computer vision is the detection of the salient regions of an image. The numerousapplications that make use of these regions have led to different definitions and interesting detectionalgorithms. Classic, algorithms for saliency detection focus on identifying the fixation points that ahuman viewer would focus on at first glance. This type of saliency is important for understanding humanattention as well as for specific applications such as automatic focusing. Other methods have concentratedon detecting a single dominant object in an image.

Itti et al. [5] presented a visual attention system that combines multiple scale image features into asingle topographical saliency map, according to the behavior and neuronal architecture of the early pri-mate visual system. These multiple scale image features comprise 6 intensity, 12 color and 24 orientationmaps. To detect these multiple scale image features efficiently, Itti et al. compute only coarse maps ata coarse level. From these, a coarse saliency map is produced. Figure 2(b) depicts the saliency mapgenerated using the algorithm proposed by Itti et al. This method differs from our method, in that wecompute a fine saliency map efficiently.

Goferman et al. [6] presented a context-aware saliency map that aims to detect the image regionsrepresenting the scene. They considered that a salient pixel in an image should have a unique appearance.They focused on the notion that one should not consider isolated pixels, but rather the surrounding patch,which gives an immediate context. Thus, a pixel i is considered salient if the appearance of the patch pi

centered at pixel i is distinctive with respect to all other image patches.Specifically, let dcolor(pi, pj) be the Euclidean distance between the vectorized patches pi and pj in CIE

Lab color space, normalized to the range [0, 1]. Pixel i is considered salient if dcolor(pi, pj) is high ∀j.Let dposition(pi, pj) be the Euclidean distance between the positions of patches pi and pj , normalized by

the larger dimension of image. Based on the observations above, Goferman et al. defined a dissimilaritymeasure between a pair of patches as

d(pi, pj) =dcolor(pi, pj)

1 + c · dposition(pi, pj). (1)

Page 4: Efficient random saliency map detection

1210 Huang Z Y, et al. Sci China Inf Sci June 2011 Vol. 54 No. 6

Thus, Goferman’s method considers the K most similar patches; if the most similar patches are highlydifferent from pi, then clearly all image patches are highly different from pi. Hence, for every patch pi,they search for the K most similar patches {qk}K

k=1 in the image, according to eq. (1), and compute thesaliency of a pixel as

S = 1 − exp{− 1

K

K∑k=1

d(pi, qk)}

. (2)

The method of Goferman et al. only detects a coarse saliency map with a width (or height) of 256pixels. Moreover, it is computationally costly because it constructs a high dimensional vector tree toaccelerate the nearest patch searching. Brute force searching the K most similar patches or acceleratedsearching using Kd-tree in a coarse grid with about 256 pixels is inefficient. Thus, searching nearestpatches at each pixel position in fine scale would be even more inefficient.

To compute the saliency of pixel i, we call patch pi, centered at position i, the source patch. We collecta set of sample patches that differ from the source patch to compute the saliency of pixel i. The chanceof appearing more similar to the sample patch is larger when the sample patch is located closer to thesource patch. On the other hand, the chance of appearing more similar to the sample patch is smallerif the sample patch is located far away from the source patch. To compute the saliency of pixels, we donot search for all the nearest patches in the entire sample patch space in the input image. Instead, werandomly collect 2K patches from the input image, retain the closet K patches to compute saliency, anddiscard the others. Our method only needs 2K randomly collected patches and no other auxiliary datastructures to accelerate searching the K nearest patches. Compared with the method of Goferman et al.,our method is very quick and does not need much additional memory.

Both Itti’s and Goferman’s methods can detect saliency maps of the input images efficiently. However,they only detect a coarse saliency map, because fine saliency map detection would be computationallycostly and consume a large amount of memory. Our method, on the other hand, can efficiently producea fine scale saliency map which is more suitable for image and video retargeting. Figure 1 illustrates thefine saliency map for the scale-and-stretch image retargeting [11] algorithm, and the results show that afine saliency map produces better retargeting results.

3 Random saliency map detection

Our random context-aware saliency map detection method consists of four phases. The first phasecomputes multiple layer coarse saliency maps according to the Gaussian image pyramid layers of theinput image. The second phase refines the coarse saliency maps to remove any noise introduced by therandom algorithm. The third phase merges multiple coarse saliency maps into a merged saliency map.The final phase adaptively updates those regions of the merged saliency map with unreliable saliency.

The first and second phases of our random saliency map detection algorithm can be used to producecoarse saliency maps with high efficiency. Otherwise, all four phases of our method can be used togenerate a fine saliency map that considers multiple scale features.

We use a random-saliency map (RSM) as a function f : P �→ S of 2D coordinates, defined over allpossible patch coordinates (location of patch center) in the input image, with S the normalized saliencyvalues. Given patch coordinate p in input image P and its corresponding saliency value s in [0, 1],f(p) = s. The values of f are saliency values normalized to [0, 1], and stored in an array with dimensionsequal to those of P .

3.1 Random saliency detection

Because the probability of sample patches appearing similar to the source patch is larger when the samplepatches are located closer to the source patch, and is smaller if sample patches are far from the sourcepatch, we randomly collect 2K sample patches, with more of them is located closer to the source patchand fewer located further away. We use the PatchMatch algorithm and Monte Carlo method for samplingrandom sample patches.

Page 5: Efficient random saliency map detection

Huang Z Y, et al. Sci China Inf Sci June 2011 Vol. 54 No. 6 1211

(a) Layer 0 (b) Layer 1 (c) Layer 2 (d) Layer 3 (e) Layer 4

Figure 3 Multiple layer saliency map results, corresponding to the layers in a Gaussian image pyramid of the input image.

In this experiment, there are 5 layers in the Gaussian image pyramid. Images in column 1 (Layer 0) correspond to the

coarsest layer in the image pyramid except the color image, which is the input image. Images in column 5 (Layer 4) are

the finest saliency maps corresponding to the finest layer of the image pyramid. Columns 1 (Layer 0) to column 5 (Layer

4), correspond to layers of the image pyramid from the coarsest to the finest layers, respectively. Images in the bottom

row (row 4) are the direct random saliency maps for different layers of the image pyramid. They are coarse saliency maps.

Images in the third row (row 3) are the refined coarse saliency maps produced from the coarse saliency maps in the bottom

row. Images in the second row (row 2) are merged saliency maps of the current layer refined saliency map and the nearest

coarse layer refined saliency map. Images in the upper row (row 1) are updated saliency maps, except for the color image.

The updated saliency map is obtained from the merged saliency map, after updating any unreliable saliency values. The

upper rightmost saliency map is the output saliency map of our method.

3.1.1 PatchMatch sampling

Let s = f(p). We compute f(p) by testing a sequence of candidate offsets at an exponentially decreasingdistance from p:

pi = p + ωαiRi,

where Ri is uniformly random in [−1, 1] × [−1, 1], ω is a large maximum search radius, and α is a fixedratio between the search window sizes. We examine patches for i = 0, 1, 2, ..., 2K until the current searchradius ωRi is less than one pixel. At this time, if i is still less than 2K, we reset i = 0 until the testednumber of candidate offsets is 2K. In our implementation, ω is the maximum target image dimensionand α = 0.75.

For these 2K candidate patches, we compute the dissimilarity di between each candidate patch andthe patch at p according to eq. (1). To compute saliency at coordinate p, we retain the K most similarpatches from the 2K candidate patches and discard the rest.

Using eq. (2) with the K most similar patches, we can compute the saliency at coordinate p. In ourexperiments, K = 32.

The bottom row (row 4) in Figure 3 shows the saliency maps produced directly using our randomalgorithm for a single layer of the image pyramid.

3.1.2 Monte Carlo sampling

We also investigated using a Monte Carlo sampling method to collect the 2K sample patches as describedin the previous section. Let (r0, θ0), (r1, θ1), . . . , (r2k, θ2k) be a set of polar coordinates, where radius ri

is the distance from the sample patch at (ri, θi) to the source patch, and polar angle θi is the angle from

Page 6: Efficient random saliency map detection

1212 Huang Z Y, et al. Sci China Inf Sci June 2011 Vol. 54 No. 6

Figure 4 Visualization of sampling 32 sample patches around point (0,0). (a) PatchMatch sampling; (b) quadric Monte

Carlo sampling; (c) exponential Monte Carlo sampling.

the x-axis which is uniformly distributed on [0, 2π]. We aim to generate a set of polar coordinatepoints with exponential distribution distances to the source patches. We use the exponential functionf(x) = λe−λx to construct the probability density function (PDF):

p(x) =λe−λx

1 − e−λ.

With probability density p(x) defined over [0, 1], we can generate random values for ri with density p(x)from a set of uniform random numbers ξi, where ξi ∈ [0, 1]. To do this we need a cumulative probabilitydistribution function p(x),

P (x) =∫ x

0

p(x)dx =∫ x

0

λe−λx

1 − e−λdx, x ∈ [0, 1].

Then,

P−1(x) = − 1λ

ln(1 − (1 − e−λ)x).

To obtain ri, we simply transform ξi,ri = P−1(ξi),

and then we can collect a set of sample patches with exponential distances to the source patches. Readersinterested in a broader treatment of Monte Carlo techniques should consult one of the classic Monte Carlotexts [18–21].

In addition, we investigated the use of a quadratic function f(x) = a(x−1)2 as the probability densityfunction to collect sample patches. Figure 4 gives a comparison of the different sampling methods. Fromthis figure, it is clear that exponential Monte Carlo sampling is better than the other two methods.

3.2 Refining the coarse saliency maps

As shown in the images in the bottom row in Figure 3, there is a great deal of noise in the coarse saliencymaps generated directly using the random single-layer detection method.

This noise is introduced by our randomly selected 2K patches used to compute saliency. We attemptedto reduce the noise using both a bilateral filter and a mean value filter, but these filters are inefficient.Consequently, we attempted to refine the saliency selectively using the saliency of 8-neighborhoods.

Page 7: Efficient random saliency map detection

Huang Z Y, et al. Sci China Inf Sci June 2011 Vol. 54 No. 6 1213

Because randomly selected candidate patches may be very dissimilar to the patch at position p, and,moreover, may be far away from the most similar patches in the image, such patches result in verysmall saliency values. As a result, the saliency in the coarse map will be less than the actual saliency ateach pixel position. Thus, we need to increase the saliency at any position where the saliency value isunreliable.

It is intuitive that saliency smaller than the saliency of the surrounding 4-neighborhoods is unreliable.So, we refine it using the average saliency of the surrounding 4-neighborhoods. The images in the thirdrow (row 3) of Figure 3 are the refined saliency maps generated from the coarse saliency maps in thebottom row (row 4) of Figure 3. This figure shows that the refined saliency is more accurate than thecoarse saliency map.

3.3 Merging the refined saliency maps

The refined saliency map is still noisy and besides, to integrate multiple scale saliency maps into theresult saliency map, we combine the refined saliency map of the nearest coarse layer with the refinedsaliency map of the current layer to produce a merged saliency map.

RSM imerged = merge(RSM i

refined, RSM i−1refined),

where RSM imerged is the merged random saliency map in layer i, and RSM i

refined is the refined randomsaliency map in layer i. In Figure 3, images in the second row (row 2) represent the merged saliencymaps in different layers, while images in the third row are the refined saliency maps in the different layers.RSM i−1

refined is the closest refined coarse saliency map in layer i. It has different dimensions to RSM irefined.

Therefore, before merging these two refined saliency maps, we need to scale the dimensions of RSM i−1refined

to be the same as those of RSM irefined.

Initially, we used both weighted saliency and average saliency merging methods to merge the tworefined saliency maps. But results show that the weighted saliency map merging method achieves betterresults.

At coordinate p in layer i, the refined saliency is Si(r,p), which is normalized to [0, 1]. To compute the

merged saliency at layer i, the refined saliency map at layer i − 1 is uniformly scaled to have the samedimensions as layer i. We use Si−1

(r,p) to denote the saliency at position p in the scaled refined saliencymap RSM i−1

refined. Then the merged saliency is defined as

Si(m,p) = wi

pSi(r,p) + wi−1

p Si−1(r,p), (3)

wip + wi−1

p = 1.0, (4)

wip

wi−1p

=Si

(r,p)

Si−1(r,p)

. (5)

According to eqs. (4) and (5), using Si(r,p) and Si−1

(r,p), which are known values, we can compute weightswi

p and wi−1p . Then the merged saliency can be derived from eq. (3). Images in the second row of Figure

3 representing merged saliency maps, clearly illustrate that a merged saliency map combines the featuresfrom two refined saliency maps in different layers. Thus, multiple scale features are considered in themerged saliency map.

3.4 Updating the merged saliency maps

Usually, the merged saliency maps are adequate for certain image editing methods. Occasionally, however,we need a finer saliency map. At this time, we can enhance the merged saliency map using our updatingmethod. As explained in subsection 3.2, we agree that the higher the saliency at position p, the greateris the reliability.

So, we use the merged saliency map in the nearest coarse layer and the saliency map in the currentlayer of the image pyramid to update the current layer merged saliency map. In our experiment, we

Page 8: Efficient random saliency map detection

1214 Huang Z Y, et al. Sci China Inf Sci June 2011 Vol. 54 No. 6

18000160001400012000100008000600040002000

0Ltti

Goferman

RefineCUpdateC

RefineGUpdateG

Ltti

Goferman

RefineCUpdateC

RefineGUpdateG

Proc

essi

ng ti

me

(ms)

380003500030000

25000

20000

15000

10000

5000

0

Mem

ory

requ

irm

ent (

KB

)

(a) (b)

Figure 5 Comparisons of processing time and memory requirements for processing an image with dimensions 800 × 600.

RefineC depicts the processing time and memory requirements when executing only the first and second phases of our

method on a CPU, while UpdateC depicts the processing time and memory requirements when executing all four phases

of our method on a CPU. RefineG and UpdateG execute the phases defined for RefineC and UpdateC, respectively, on a

GPU. (a) Processing time; (b) memory requirements.

(a) (b) (c)

Figure 6 Comparison of saliency map results. Images in the first column (a) are the input images, in the second column

(b) the results generated using Itti’s method, and in the third column (c) the saliency maps which were produced using our

method.

select the larger saliency of the two merged saliency maps as the updated saliency value. Images in theupper row in Figure 3 represent updated saliency maps, except for the color one.

Page 9: Efficient random saliency map detection

Huang Z Y, et al. Sci China Inf Sci June 2011 Vol. 54 No. 6 1215

(a) (b) (c) (d)

Figure 7 Additional example. Images in the first and third columns ((a) and (c)) are input images. Images in the second

and the fourth columns ((b) and (d)) are the resulting saliency maps produced using our random saliency map detection

method.

4 Results and discussion

We implemented our method in C++ under MS Windows, and used it to generate a wide variety ofsaliency maps from input images. All results were produced on a Pentium(R) Dual-Core CPU E53002.60 GHz PC. Our parallel code was implemented in CUDA and executed on an NVIDIA Geforce 9800GTgraphics card.

As confirmed by the variety of images illustrating the saliency maps obtained using our method, it isclear that our approach works well for various input images and can produce fine saliency maps. Figure7 shows additional saliency map results generated from our implementation of the proposed algorithm.

Figure 1 shows that our saliency map results are superior to those of Itti and Goferman, and using ourfine saliency map, better retargeting results can be obtained. Figure 1 shows that using a fine saliencymap, the saliency region of the input image is better preserved than using a coarse saliency map.

Figure 2 demonstrates that the saliency map produced by our algorithm is better than those generatedby Goferman’s Context-Aware saliency map detection method and Itti’s approach.

Page 10: Efficient random saliency map detection

1216 Huang Z Y, et al. Sci China Inf Sci June 2011 Vol. 54 No. 6

Figure 3 shows that multiple layer saliency maps are produced at different phases using our approach.The figure shows that our map is a multiple layer multiple scale map. Considering multiple scale features,our method can produce better enhanced saliency maps. It is clear from Figure 3 that in the finest layerour random saliency map detection method can still produce fine results that can be used directly inimage editing approaches.

Figure 5 gives the processing time and memory requirements for detecting a saliency map from animage with dimensions 800 × 600. It is shown that our method is more efficient than that of Itti andGoferman.

Figure 6 compares the saliency maps produced by Itti’s method and our approach, while Figure 7 givesfurther examples of saliency maps generated using our method.

5 Conclusion and future work

In this paper, we presented a random context-aware saliency map detection method. This method canproduce fine saliency maps efficiently. Our method differs from Itti’s method and Goferman’s Context-Aware method, in that they only down sample to a coarse layer when computing a coarse saliency map.

Our method can quickly generate coarse saliency maps for retargeting videos in real-time. Besides,our method can also produce high quality fine scale saliency maps for image retargeting. The algorithmproposed in this paper consists of four phases. The first phase generates a coarse saliency map. Thesecond phase refines the coarse saliency map to remove random noise. The third phase merges the coarsesaliency maps to produce a multiple scale saliency map. The final phase enhances the merged saliencymap to obtain a high quality fine saliency map.

There are a few disadvantages of our method. At a finer scale, the output from our method is verysimilar to the results obtained from edge detection. Furthermore, it is difficult to create a saliency areaof an image as a whole.

In future work, we aim to investigate how to transfer block saliency accurately from a coarse layer toa fine layer and to improve our random saliency map detection method.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61070078), and the

Fundamental Research Funds for the Central Universities.

References

1 Chen B, Sen P. Video carving. In: Short Papers Proceedings of Eurographics, Hersonisso Greece, 2008

2 Wolf L, Guttmann M, Cohen-Or D. Non-homogeneous content-driven video-retargeting. In: Proceedings of the Eleventh

IEEE International Conference on Computer Vision, Rio de Janeiro, 2007. 1–6

3 Liu L, Chen R, Wolf L, et al. Optimizing photo composition. Comput Graph Forum, 2010, 29: 469–478

4 Rubinstein M, Shamir A, Avidan S. Improved seam carving for video retargeting. ACM Trans Graph, 2008, 27: 1–9

5 Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Patt Anal

Mach Intell, 1998, 20: 1254–1259

6 Goferman S, Zelnik-Manor L, Tal A. Context-aware saliency detection. In: IEEE Conference on Computer Vision and

Pattern Recognition, San Francisco, 2010. 2376–2383

7 Barnes C, Shechtman E, Finkelstein A, et al. PatchMatch: a randomized correspondence algorithm for structural image

editing. ACM Trans Graph, 2009, 28: 1–11

8 Pritch Y, Kav-Venaki E, Peleg S. Shift-map image editing. In: Proceedings of the Twelfth IEEE International Conference

on Computer Vision, Kyoto, 2009. 151–158

9 Wang Y, Fu H, Sorkine O, et al. Motion-aware temporal coherence for video resizing. In: ACM SIGGRAPH Asia 2009

papers, ACM Press, 2009. 1–10.

10 Avidan S, Shamir A. Seam carving for content-aware image resizing. ACM Trans Graph, 2007, 26:1–8

11 Wang Y, Tai C, Sorkine O, et al. Optimized scale-and-stretch for image resizing. ACM Trans Graph, 2008, 27: 1–8

Page 11: Efficient random saliency map detection

Huang Z Y, et al. Sci China Inf Sci June 2011 Vol. 54 No. 6 1217

12 Rubinstein M, Shamir A, Avidan S. Multioperator media retargeting. ACM Trans Graph, 2009, 28: 1–11

13 Chen B, Lee K, Huang W, et al. Capturing intention-based full-frame video stabilization. Comput Graph Forum, 2008,

27: 1805–1814

14 Liu H, Xie X, Ma W, et al. Automatic browsing of large pictures on mobile devices. In: Proceedings of the Eleventh

ACM International Conference on Multimedia, Berkeley, CA, USA, 2003. 148–155

15 Santella A, Agrawala M, DeCarlo D, et al. Gaze-based interaction for semi-automatic photo cropping. In: Proceedings

of the SIGCHI Conference on Human Factors in Computing Systems, Montreal, Canada, 2006. 771–780

16 Cho T, Butman M, Avidan S, et al. The patch transform and its applications to image editing. In: IEEE Conference

on Computer Vision and Pattern Recognition, Anchorage, Alaska, 2008. 1–8

17 Wang Y, Lee T, Tai C. Focus+ context visualization with distortion minimization. IEEE Trans Visual Comput Graph,

2008, 14: 1731–1738

18 Halton J. A retrospective and prospective survey of the Monte Carlo method. Siam Review, 1970, 12: 1–63

19 Hammersley J, Handscomb D. Monte Carlo methods. Taylor & Francis, Abingdon Oxfordshire , UK1964

20 Meteopolis N, Ulam S. The Monte Carlo method. J Am Stat Assoc, 1949, 44: 335–341

21 Yakowitz S. Computational Probability and Simulation. Massachusetts: Addison-Wesley Reading, 1977