Research ArticleSalient Region Detection via Feature Combination andDiscriminative Classifier
Deming Kong,1 Liangliang Duan,2 Peiliang Wu,2 and Wenji Yang2
1School of Electrical Engineering, Yanshan University, Qinhuangdao 066004, China2School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
Correspondence should be addressed to Liangliang Duan; [email protected]
Received 18 July 2015; Revised 12 November 2015; Accepted 24 November 2015
Academic Editor: Jian Guo Zhou
Copyright ยฉ 2015 Deming Kong et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
We introduce a novel approach to detect salient regions of an image via feature combination and discriminative classifier. Ourmethod, which is based on hierarchical image abstraction, uses the logistic regression approach to map the regional feature vectorto a saliency score. Four saliency cues are used in our approach, including color contrast in a global context, center-boundarypriors, spatially compact color distribution, and objectness, which is as an atomic feature of segmented region in the image. Bymapping a four-dimensional regional feature to fifteen-dimensional feature vector, we can linearly separate the salient regions fromthe clustered background by finding an optimal linear combination of feature coefficients in the fifteen-dimensional feature spaceand finally fuse the saliency maps across multiple levels. Furthermore, we introduce the weighted salient image center into oursaliency analysis task. Extensive experiments on two large benchmark datasets show that the proposed approach achieves the bestperformance over several state-of-the-art approaches.
1. Introduction
Humans have the ability to locate the most interesting regionin a cluttered visual scene by selective visual attention. Thetask of computer vision is to simulate the human intelligence,and the related research has been carried out for manyyears. The study of human visual systems suggests that thesaliency is related to rarity, uniqueness, and surprise of ascene. It has recently gained much attention [1โ23], as it hasbeen brought into the various applications, including imageclassification [24], object recognition [25], and content-awareimage editing [26].
Existing saliency region detectionmethods can be rough-ly classified into two categories: bottom-up, data drivenand top-down, task driven approaches. Bottom-up methodswhich utilize low-level image features, such as color, intensity,and texture, determine the contrast of image regions to theirsurroundings, while top-down methods make use of high-level knowledge about โinterestingโ object. The majority ofmost bottom-upmodels can be roughly divided into local andglobal schemes.
Inspired by the early work by Treisman and Gelade [27]and Koch and Ullman [28], Itti et al. [1] proposed highlyinfluential biologically plausible saliency analysismethod andthey define image saliency using local center-surroundingoperators across multiscale image features, including inten-sity, color, and orientation. Harel et al. [4] proposed amethod to generate saliency map by nonlinearly combininglocal uniqueness maps from different feature channels. Maand Zhang [29] propose a novel approach which directlycomputed center-surround color difference in a fixed neigh-borhood for each pixel and then utilize a fuzzy growth modelto extract image salient region. They classify the saliencyinto three levels: attended view, attended areas, and attendedpoints. Liu et al. [19] propose a set of novel features, includingcenter-surround histogram, multiscale contrast, and colorspatial distribution, which are unified in a CRF learningframework, to detect salient region in images.
Later on, many saliency models are proposed whichexploit various types of image features in a global scope forsaliency detection. Hou and Zhang [3] propose a spectralresidual method that relies on frequency domain processing.
Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015, Article ID 846895, 13 pageshttp://dx.doi.org/10.1155/2015/846895
2 Mathematical Problems in Engineering
Zhai and Shah [2] define pixel-level saliency based onpixels contrast to all other pixels. To improve computationalefficiency, they introduce the color histogram to analyzeimage saliency. Achanta et al. [5] propose a frequency tunedmethodwhich achieves globally consistent results by definingthe saliency as the distance between the pixel and the overallmean image color. Cheng et al. [6] also utilize the colorhistogram and segmented region to analyze image saliency,which enable the assignment of comparable saliency valuesacross similar image regions.
High-level priors have been used to analyze imagesaliency in recent years. Judd et al. [30] train SVM modelusing a combination of low-, middle-, and high-level imagefeatures, making their approaches potentially suitable forspecific high-level computer vision tasks. The concept ofcenter prior was considered in their approach. Shen andWu [8] unify three higher level priors, including locationprior, semantic prior, and color prior, to a low rank matrixrecovery framework. Shape prior is proposed in Jiang et al.[7]; concavity context is utilized by [31]. Wei et al. [32]turn to background priors to analyze image saliency, andthey assume that the image boundary is mostly background.Subsequently, many recent approaches use boundary prior toguide saliency detection, such asGMR [11], SO [17], PDE [33],AMC [15], andDSR [34], and using thosemethods can obtainstate-of-the-art performance on several public availabledatasets.
Recent studies indicate that single saliency cue is farfrom being comprehensive. Some methods such as LC [2],FT [5], and HC [6] only use the contrast cue and thegenerated saliency maps are disappointing; the contrast cuesometimes produces high saliency values for backgroundregions, especially for regions with complex structures. Toalleviate the above problems, some approaches such as SF[18], PD [9] GC [12], PISA [21], PR [22], UFO [14], and HI[10] use multiple cues. Perazzi et al. [18] formulate saliencyestimation using high-dimensional Gaussian filters by whichregion color and region position are, respectively, exploited tomeasure region uniqueness and distribution. Cheng et al. [12]and Tong et al. [22] also consider color contrast cue and colordistribution cue when computing the saliencymap.Margolinet al. [9] combine pattern distinctness, color uniqueness,and organization priors to generate saliency result. Shi et al.[21] present a generic framework for saliency detection viaemploying three terms, including color-based contrast term,structure-based contrast term, and spatial priors. Jiang et al.[14] propose a novel algorithm by integrating three saliencycues, namely, uniqueness, focusness, and objectness. Yanet al. [10] propose a multilayer approach to analyze imagesaliency. To determine the single-layer saliency cue, theyexploit two useful saliency cues, including local contrastand location heuristic, and then a hierarchical inferenceframework is used to generate the final saliency map. Theabove-mentioned algorithms compute saliency maps fromvarious cues and heuristically combine them to get the finalresults.
These methods can generate ideal saliency map whendealing with simple images.When computing the image withcomplex background, some methods such as [9, 12, 18] can
only highlight part of salient object. Though methods suchas [10, 14, 22] can highlight the entire object uniformly, thebackgroundmay be highlighted too.Thus, to differentiate realsalient regions from high-contrast parts, more saliency cuesincluding low-level feature and high-level priors need to beintegrated. To the best of our knowledge, there are few worksthat model the interaction between different saliency cues.Inspired by the work [10, 23], we propose a feature combi-nation strategy which can capture the interaction betweendifferent cues. Our main contributions lie in three aspects.Firstly, we introduce feature combination to model the inter-action between different cues, which is different from mostexisting methods that generate saliency maps heuristicallyfrom various cues. Secondly, we formulate salient estimationas a classification problem and learn a logistic classifier thatcan directly map a fifteen-feature vector to a saliency value.Thirdly, the use of smoothing and weighted salient imagecenter can further improve the detection performance. Theexperimental results show that our method can generatereasonable saliency map, even though the image containscomplex background and the salient object has similar colorto background.
The framework of the approach is presented in Figure 1.First, our approach includes four main parts. The first oneis hierarchical image abstraction, which segments the imageto homogenous regions across several layers by using theefficient graph-based image segmentation [35]. Second, foursaliency cues, including color contrast in a global context,center-boundary priors, spatially compact color distribution,and objectness, are used as an atomic regional feature.Then we map a four-dimensional regional feature to fifteen-dimensional feature vector which can capture the interac-tion between different features. Third, a logistic regressionclassifier is trained for mapping a fifteen-dimensional featurevector to a saliency value. Finally, we combine the saliencymap at different layers to obtain our saliency map. Figure 2shows samples of saliency maps generated by state-of-the-artmethods and by ours.
The remainder of this paper is organized as follows.The proposed model is introduced in Section 2. Section 3presents experiments and results. This paper is summarizedin Section 4.
2. The Proposed Approach
Our method can be divided into four main stages: hierarchi-cal image abstraction, regional feature generation, traininga logistic regression classifier, and multilayer saliency mapintegration and reinforcement. In the following, we describethe details of the proposed approach.
2.1. Hierarchical Image Abstraction. Given an image ๐ผ, hier-archical image abstraction can be described as ๐ป ={๐ป1, ๐ป2, . . . , ๐ป
๐}, where๐ is the number of image pyramids
and ๐ป๐is the abstraction result of the ๐th layer of image
pyramid, including ๐๐regions. We use the image-level hier-
archical image abstraction which is different from regional-level image abstraction [10, 23], as is shown in Figure 1(b).The result of segmentation in first layer of image is described
Mathematical Problems in Engineering 3
(a) Input image
First layer for training
Atomic features
Mapping
15D feature(b) Image
(c) L
earn
ing
layers Ground truth
Trainingset
(e) Final results
(d) Initialmaps ofdifferent
layersTrain
LR classifier(๐1, ๐2, . . . , ๐15)
Weighted feature combination
Figure 1: An overview of our weighted feature combination framework. We extract four image layers from input and then train a logisticregression classifier by using the four atomic features. Initial saliencymaps of the four layers can be obtained byweighted feature combination.Finally, we fuse saliency maps of different layers to obtain the final saliency map.
Figure 2: Saliency maps (from left to right, top to down): input, G-Truth, IT [1], LC [2], SR [3], GB [4], FT [5], HC [6], RC [6], LR [8], PD[9], HI [10], GMR [11], GC [12], BMS [13], UFO [14], AMC [15], HDCT [16], SO [17], and ours.
as ๐ป1= {๐
1
1, ๐ 1
2, . . . , ๐
1
๐1
}, and the segmentation result ofother layers can be described in a similar way. Each superpixel๐ ๐
๐is represented by mean color ๐๐
๐(in CIELab) and a
spatial position ๐๐๐
(๐ฅ-coordinate and ๐ฆ-coordinate) whichare defined as ๐๐
๐= โ๐โ๐ ๐
๐
๐๐(๐)/๐๐
๐, ๐๐๐= โ๐โ๐ ๐
๐
๐๐(๐)/๐๐
๐,
where ๐ stands for a pixel in the region ๐ ๐๐, while ๐
๐(๐)
represents the color vector of pixel ๐ and ๐๐(๐)
represents thecoordinate vector of pixel ๐.๐๐
๐is the number of pixels in the
segmented region ๐ ๐๐.
2.2. Regional Feature Generation
(1) Color Contrast Cue. Given the result of segmentationin ๐ layer of image pyramid which is described as ๐ป
๐=
{๐ ๐
1, ๐ ๐
2, . . . , ๐
๐
๐๐
}, the color contrast of a region ๐๐๐(1)
can beformulated as follows:
๐๐
๐(1)=
๐๐
โ
๐=1,๐!=๐
๐ (๐ ๐
๐, ๐ ๐
๐) โ ๐ (๐
๐
๐, ๐ ๐
๐) โ ๐๐
๐, (1)
where ๐(๐ ๐๐, ๐ ๐
๐) is the smooth term which considers the
distance between two regions and ๐(๐ ๐๐, ๐ ๐
๐) is the color
distance between ๐ ๐๐and ๐ ๐
๐.
(2) Color Distribution Cue. Inspired by Liu et al. [19], weuse the nonoverlapped region as computing unit to computeregion color distribution. First, all region colors are repre-sented by Gaussian Mixture Models (GMMs) {๐
๐, ๐๐, ฮฃ๐}5
๐=1,
where {๐๐, ๐๐, ฮฃ๐} is weight, the mean color, and the covari-
ance matrix of the qth Gaussian component. The proba-bility of a region belonging to the ๐th component is givenby
๐ (๐ | ๐ ๐
๐) =
๐๐๐(๐ ๐
๐| ๐๐, ฮฃ๐)
โ๐๐๐๐(๐ ๐
๐| ๐๐, ฮฃ๐)
. (2)
The number of Gaussian is set to 5 in the subsequentexperiment. We exploit ๐-means algorithm to initializethe parameters of GMMs and EM algorithm to train
4 Mathematical Problems in Engineering
the GMMs. Referring to [19], the horizontal spatial vari-ance of the ๐th clustered component of GMMs is definedas
๐ท๐ฅ(๐) =
1
ฮ๐
๐๐
โ
๐=1
๐ (๐ | ๐ ๐
๐) โ ๐๐
๐(๐ฅ)โ ๐พ(๐ฅ)(๐)
2
,
๐พ(๐ฅ)(๐) =
1
ฮ๐
๐๐
โ
๐=1
๐ (๐ | ๐ ๐
๐) โ ๐๐
๐(๐ฅ),
(3)
where ฮ๐= โ๐๐
๐=1๐(๐ | ๐
๐
๐) and ๐๐
๐(๐ฅ)is the ๐ฅ-coordinate
of ๐ ๐๐. The vertical spatial variance ๐ท
๐ฆ(๐) can be defined in
the same way. Different from Liu et al. [19] that use the twovariances to compute saliency cue, we only use the horizontalspatial variance. The color distribution of region ๐ ๐
๐can be
defined as
๐๐
๐(2)=
1
exp (โ5๐=1๐ (๐ | ๐
๐
๐) โ ๐ท๐ฅ(๐))
. (4)
(3) Center-Boundary Prior Cue. Location is an importantfactor in saliency detection. Center and boundary are twopriors which are widely used in previous saliency detectionmethods. After considering above two priors, our center-boundary heuristic is thus defined as
๐๐
๐(3)= ๐๐
๐(cp) โ ๐๐
๐(bp), (5)
where ๐๐๐(cp) is the center prior term which measures the
distance between the region ๐ ๐๐and the image center and
it is defined as ๐๐๐(cp) = 1/(๐ + โโ๐
๐
๐โ ๐โ2/2), where ๐ is
the center of image and it is set to (0.5, 0.5); the parameter๐ controls the sensitivity of the center prior and it is setto 1 in the experiment. ๐๐
๐(bp) is the boundary prior termwhichmeasures the color distance between the region๐ ๐
๐and
the image boundary. Inspired by the approach proposed byYang et al. [11], we define background feature of region ๐ ๐
๐
as
๐๐
๐(bp) = log(๐ต๐
๐(top)
๐๐ก
โ
๐ต๐
๐(bottom)
๐๐
โ
๐ต๐
๐(left)
๐๐
โ
๐ต๐
๐(right)
๐๐
) , (6)
where๐ต๐๐(top) is the sum of distances from region๐
๐
๐to the top
boundary of image which is different fromYang et al. [11] and๐๐กis the number of regions that intersect with the top image
boundary.We use a simple approach to compute๐ต๐๐(top) which
is given by๐ต๐๐(top) = โ
๐๐ก
๐=1โ๐๐
๐โ๐๐
๐โ. And๐ต๐
๐(bottom),๐ต๐
๐(left), and๐ต๐
๐(right) can be computed in a similar way.
(4) Objectness Cue. Recently, a generic objectness measure isproposed to quantify how likely it is for an image windowto contain an object of any class. The measure is based onlow-level image cues. As our goal is to obtain a saliency mapfor the whole image, we should transfer the objectness value
from the bounding boxes to the pixel level first and then wecan obtain region level objectness measure. For more details,please refer to UFO [14]. For each region, we can get itsregion-level objectness ๐๐
๐(4)= โ๐โ๐ ๐
๐
๐๐(๐)/๐๐
๐, where ๐
๐(๐)
is the objectness value for pixel ๐.
(5) Cues Smoothing. Thus, we get four saliency cues and theyare normalized to range [0, 1] using minimumโmaximumnormalization. Although we can efficiently compute foursaliency cues, there exist at least two problems. Firstly, someregions with similar property will have very different saliencyvalue and, secondly, some adjacent regions will be assignedto very different saliency value. To reduce noisy saliencyresults caused by above-mentioned issues, we use twosmoothing procedures to refine the saliency value for eachregion.
๐พ-Means Clustering Based Smoothing. Given the results ofsegmentation in๐ layer of image pyramid which is describedas ๐ป๐= {๐ ๐
1, ๐ ๐
2, . . . , ๐
๐
๐๐
}, we first exploit ๐-means cluster-ing algorithm to divide the segmented regions into differentclusters in each layer. Referring to [36], we can then define anobject function, sometimes called a distortionmeasure, givenby
๐ฝ =
๐๐
โ
๐=1
๐พ
โ
๐=1
๐๐๐
๐๐
๐โ ๐๐
, (7)
which we can easily solve for ๐๐to give
๐๐=โ๐๐
๐=1๐๐๐โ ๐๐
๐
โ๐๐
๐=1๐๐๐
. (8)
๐๐๐is the binary indicator variables; if a region is assigned to
cluster ๐, then ๐๐๐= 1, and ๐
๐๐= 0 for ๐ ฬธ= ๐. This is known
as the 1-of-๐พ coding scheme. The two phases of reassigningdata points to clusters and recomputing the cluster meansare repeated in turn until there is no further change in theassignments.Then we get the number for each cluster cl (๐) =โ๐๐
๐=1๐๐๐. We replace the saliency values of each region by the
weighted average of the saliency values of the same cluster(measured by ๐ฟโ๐โ๐โ distance). The saliency value of eachregion ๐ ๐
๐can be refined by
๐๐
๐(tmp) = ๐ผ โ ๐๐
๐(tmp) + (1 โ ๐ผ)cl(๐)โ
๐=1,๐ ฬธ=๐
๐ (๐ ๐
๐, ๐ ๐
๐)
๐๐๐
๐(tmp), (9)
where tmp can be replaced by 1, 2, 3, and 4. The parameter๐ผ controls the importance of color space smoothing term.In our experiment, we set the parameter ๐ผ = 0.5. Consider๐ = โ
cl(๐)๐=1,๐ ฬธ=๐
๐(๐ ๐
๐, ๐ ๐
๐) is the sumof distances between region
๐ ๐
๐and the other region in cluster ๐. Consider ๐(๐ ๐
๐, ๐ ๐
๐) =
exp(โ1รโ๐๐i โ๐๐
๐โ/2๐2) is the color distance between region
๐ ๐
๐and region ๐ ๐
๐; we set ๐2 = 10 in the experiments.
Spatial Based Smoothing. We assume that two adjacentregions are likely to have similar saliency value.Therefore, we
Mathematical Problems in Engineering 5
propose a spatial based approach to refine saliency betweenadjacent regions and the procedure is very similar to colorspace smoothing. We replace the saliency values of eachregion by the weighted average of the saliency values of itsneighbors.
(6) Regional Feature. After completing the above steps, we canget four atomic features (๐๐
๐(1), ๐๐๐(2)
, ๐๐๐(3)
, and ๐๐๐(4)
) for eachsegmented region, including color contrast, center-boundaryprior, color distribution, and objectness. In order to capturethe interaction between the four different features, a novelfeature is generated by mapping a four-dimensional regionalfeature to fifteen-dimensional feature vector. There are fourkinds of combinations: single term, double term, triple term,and quadruple term. For single term, we use the four atomicfeatures (๐ฅ
1โ๐ฅ4in the vector). For double term, there are six
elements which are combination of any two atomic features(๐ฅ5โ๐ฅ10) (๐๐๐(1)โ ๐๐
๐(2), ๐๐๐(1)โ ๐๐
๐(3), ๐๐๐(1)โ ๐๐
๐(4), ๐๐๐(2)โ ๐๐
๐(3), ๐๐i(2) โ
๐๐
๐(4), and ๐๐
๐(3)โ ๐๐
๐(4)). For triple term, the new features are
the combination of three different atomic features which areformed to ๐ฅ
11โ๐ฅ14in the new vector (๐๐
๐(1)โ ๐๐
๐(2)โ ๐๐
๐(3), ๐๐๐(1)โ ๐๐
๐(2)โ
๐๐
๐(4), ๐๐๐(1)โ ๐๐
๐(3)โ ๐๐
๐(4), and ๐๐
๐(2)โ ๐๐
๐(3)โ ๐๐
๐(4)). For the quadruple
term, there is only one element which is the product of thefour atomic features. The last feature ๐ฅ
15is ๐๐๐(1)โ ๐๐
๐(2)โ ๐๐
๐(3)โ
๐๐
๐(4). Finally, we can get a novel fifteen-dimensional feature
vector.
2.3. Learning Framework for Saliency Estimation. The logisticfunction is useful because it can take an input with anyvalue from negative to positive infinity, whereas the outputalways takes values between zero and one [37]. We take fulladvantage of this property; thus our saliency estimation canbe formulated as a probability framework. Let us assume that
๐ (๐ฆ = 1 | ๐ฅ; ๐) = โ๐(๐ฅ) ,
๐ (๐ฆ = 0 | ๐ฅ; ๐) = 1 โ โ๐(๐ฅ) ,
(10)
where โ๐(๐ฅ) = ๐(๐
๐๐ฅ) = 1/(1 + ๐
โ๐๐๐ฅ) is our hypotheses.
Consider ๐(๐ง) = 1/(1 + ๐โ๐ง) is called the logistic functionor the sigmoid function. Notice that ๐(๐ง) tends towards 1 as๐ง โ +โ, and ๐(๐ง) tends towards 0 as ๐ง โ โโ. Hence,our hypotheses is always bounded between 0 and 1 and highervalue indicates that the region is likely to belong to a salientobject.
The parameter ๐ is what we want to learn from the data.We use the first layers of image pyramid for training, giventhe result of segmentation in 1st layer of image pyramidwhichis described as๐ป
1= {๐ 1
1, ๐ 1
2, . . . , ๐
1
๐1
}. A segmented region isconsidered to be positive if the number of the pixels belongingto the salient object exceeds 90%of the number of the pixels inthe region and its saliency value is set to 1. On the contrary, aregion is considered to be negative if the number of the pixelsbelonging to the salient object is under 10% of the numberof the pixels in the region and its saliency value is set to 0.As aforementioned, each segmented region is described bya fifteen-dimensional vector x. We learn a logistic regressionclassifier ๐ from the training data X = {x
1, x2, . . . , x
๐} and
the saliency value Y = {y1, y2, . . . , y
๐}. Once the parameter ๐
is obtained, we can quickly perform the saliency estimationusing (10).
2.4. Multilayer Saliency Map Integration and Reinforcement
2.4.1. Multilayer Saliency Map Integration. We combineimage pyramid which is the multiscale representation ofimage to suppress background region. Similar to [1], thesaliency map is obtained by adjusting the saliency map to thesame scale and point-by-point addition. The fusion strategyis given by ๐Fusion(๐ผ) = โจ
๐
๐=1sal(๐ผ๐), where ๐ผ is the input
image, ๐ผ๐ is the mth layer of image pyramid, and sal(๐ผ๐)is the saliency detection result of the mth layer of imagepyramid.
2.4.2. Reinforcement of Salient Region. Salient object is alwaysdistributed in local region of the image, while backgroundhas a high degree of dispersion. To use such property,we introduced the weighted salient image center into oursaliency estimation, and the newly defined salient center isdefined as
(๐๐, ๐๐) =
โ๐
๐=1๐๐๐(๐)โ ๐Fusion (๐๐)
โ๐
๐=1๐Fusion (๐๐)
, (11)
where ๐ is the number of pixels in image and ๐๐is the ๐th
pixel. Hence, the final pixel-level saliency can be defined as
๐Final (๐ โ ๐ผ) = ๐Fusion (๐) โ exp(โ๐ท2
๐
๐2) , (12)
where ๐ is a pixel in an image, ๐ท๐is the Euclidean distance
between the pixel ๐ and the weighted salient image center,and the parameter ๐2 is the smooth term which controlsthe strength of spatial weight; we set the parameter ๐2 to0.4.
3. Experiments and Results
To validate our proposed approach, we have performedexperiments on two publicly available datasets. (1)The firstone is the MSRA dataset [19], which contains 5000 imageswith pixel-level grounds truth. We used the same trainingset, validating set, and test set as the paper of Jiang et. al[23]. The training set contains 2500 images, the validatingset contains 500 images, and the testing set contains 2000images. (2) The second one is the ECSSD dataset, whichcontains 1000 images with multiple objects and makes thedetection tasks much more challenging and the pixel-levelground truth for image is provided by Yan et al. [10]. Onthe two datasets, we compare our method with 18 state-of-the-art saliency detection methods, including IT [1], LC [2],SR [3], GB [4], FT [5], HC [6], RC [6], CB [7], LR [8], PD[9], HI [10], GMR [11], GC [12], BMS [13], UFO [14], AMC[15], HDCT [16], and SO [17]. Three parameters of graph-based image segmentation [35] are used in the algorithm, andwe set conservative parameters, sigma = 0.5 and min = 10,for all the images. The third parameter ๐พ is set to different
6 Mathematical Problems in Engineering
0 10.2
0.2
0.3
0.4
0.4
0.5
0.6
0.6
0.7
0.8
0.8
0.9
1
Recall
Prec
ision
OursHDCTFTHCHI
LCLRGMRRCSR
(a)
OursAMCUFOPDSO
BMSCBGBGCIT
0 10.2
0.2
0.3
0.4
0.4
0.5
0.6
0.6
0.7
0.8
0.8
0.9
1
Recall
Prec
ision
(b)
Our
sSO H
TA
MC
MR
UFO B
S HI
GC PD LR CB
RC HC FT GB IT LC SR
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
PrecisionRecallF๐ฝ
(c)
Figure 3: Experimental results on the MSRA dataset. (a) and (b) are precision and recall curves of all approaches which are obtained usingfixed threshold. The histogram (c) (precision, recall, and ๐น
๐ฝ) is obtained using adaptive thresholding.
value according to image layers, We set ๐พ = 100 for thefirst layer, ๐พ = 75 for the second layer, and ๐พ = 50 forother layers. To evaluate these methods, we either run ourown implementations or exploit the results from the originalauthors.
3.1. Evaluation Methods. Following [5, 6, 8], we evaluate theperformance of our method measuring its precision andrecall rate. Precisionmeasures the percentage of salient pixels
correctly assigned, while recall measures the percentage ofsalient object detected. In order to study the performance ofsaliency detection approaches, we use two kinds of objectivecomparison measures in previous studies.
Firstly, the saliency map is segmented by a fixed thresh-old. Given a threshold ๐ โ [0, 255], the pixels whosesaliency values are lower than ๐ are marked as back-ground; otherwise the pixels are marked foreground. Whenvarying from 0 to 255, it will produce a sequence of
Mathematical Problems in Engineering 7
Figure 4: Saliency maps (from left to right, top to down): input, G-Truth, FT [5], HC [6], RC [6], LR [8], PD [9], GC [12], HI [10], GMR [11],BMS [13], UFO [14], AMC [15], HDCT [16], SO [17], and ours.
precision-recall pairs, and a precision-recall curve can beobtained.
Secondly, we follow [5, 6, 8] to segment a saliency mapby adaptive thresholding. The image is first segmented bymean-shift clustering algorithm. And then we calculate theaverage saliency value of each nonoverlapped region; anoverall mean saliency value over the entire saliency map iscalculated as well. The mean-shift segments whose saliencyvalue is larger than twice of the overall mean saliency valuewill be marked as foreground, and the threshold is definedas
๐๐=
2
๐ ร๐ป
๐
โ
๐ฅ=1
๐ป
โ
๐ฆ=1
๐Final (๐ฅ, ๐ฆ) , (13)
where ๐ and ๐ป are the width and height of the saliencymap, respectively. In many applications, high precision andbetter recall rate are both required. In addition to pre-cision and recall, we thus estimate ๐น
๐ฝ, which is defined
as
๐น๐ฝ=
(1 + ๐ฝ2) โ Precision โ Recall
๐ฝ2 โ Precision + Recall, (14)
where we set ๐ฝ2 = 0.3 as is suggested in [5, 6, 18].
3.2. Performance on MSRA Dataset. We report both quan-titative and qualitative comparisons of our method with 18state-of-the-art saliency detection approaches on the MSRAdataset.
Quantitative Comparison. Figures 3(a) and 3(b) show theprecision-recall curves of all the algorithms on the MSRA-5000 dataset. As observed from Figure 3, the curve of ourmethod is consistently higher than others on this dataset.Besides, we compare the performance of various methodsusing adaptive thresholding. Each value of our๐-๐ -๐น (0.8524,0.7794, and 0.8343) ranks first among the 18 state-of-the-artmethods.
Qualitative Comparison. The visual comparison is given inFigure 4. To save the space, we only consider the mostrecent thirteen models: FT [5], HC [6], RC [6], LR [8],PD [9], GC [12], HI [10], GMR [11], BMS [13], UFO [14],AMC [15], HDCT [16], and SO [17]. Our method producesthe best detection results on these images. It is also worthpointing out that our method can handle the challeng-ing cases where the background is very extremely clut-tered.
8 Mathematical Problems in Engineering
OursHDCTGMRHIBMS
CBFTGBHCLC
0 10.2
0.2
0.3
0.4
0.4
0.5
0.6
0.6
0.7
0.8
0.8
0.9
1
Recall
Prec
ision
(a)
OursSOAMCUFOPD
GCLRRCSRIT
0 10.2
0.2
0.3
0.4
0.4
0.5
0.6
0.6
0.7
0.8
0.8
0.9
1
Recall
Prec
ision
(b)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Our
sSO H
TA
MC
MR
UFO B
S HI
GC PD LR CB
RC HC FT GB IT LC SR
PrecisionRecallF๐ฝ
(c)
Figure 5: Experimental results on the ECSSD dataset. (a) and (b) are precision and recall curves of all approaches which are obtained usingfixed threshold. The histogram (c) (precision, recall, and ๐น
๐ฝ) is obtained using adaptive thresholding.
3.3. Performance on ECSSD Dataset. The ECSSD dataset isa more challenging dataset provided by Yan et al. [10]. As isshown in Figure 5, our approach achieves the best precision-recall curve.We also evaluate average precision, recall, and๐น
๐ฝ
using adaptive thresholding; our recall and ๐น๐ฝvalue rank first
among all the methods.We also provide the visual comparison of different
approaches in Figure 6, from which we see that our approachproduces the best detection results on these images and
can highlight the entire salient object uniformly. We onlyconsider the most recent thirteen models: FT [5], HC [6], RC[6], LR [8], PD [9], GC [12], HI [10], GMR [11], BMS [13],UFO [14], AMC [15], HDCT [16], and SO [17].
3.4. Evaluation on Different Feature Combination. To ver-ify the effectiveness of the proposed feature combinationmethod, we plot the corresponding ๐-๐ curves and thehistogram of four combination schemes on the ASD dataset.
Mathematical Problems in Engineering 9
Figure 6: Saliency maps (from left to right, top to down): input, G-Truth, FT [5], HC [6], RC [6], LR [8], PD [9], GC [12], HI [10], GMR [11],BMS [13], UFO [14], AMC [15], HDCT [16], SO [17], and ours.
0 10.1
0.2
0.2
0.3
0.4
0.4
0.5
0.6
0.6
0.7
0.8
0.8
0.9
1
Recall
Prec
ision
Scheme 1 (x1โx4)Scheme 2 (x1โx10)
Scheme 3 (x1โx14)Scheme 4 (x1โx15)
(a)
Scheme 4 Scheme 3 Scheme 2 Scheme 1
0.1
0
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
PrecisionRecallF๐ฝ
(b)
Figure 7: Experimental results on the ASD1000 dataset. (a) is PR curves of four schemes. The histogram (b) (precision, recall, and ๐น๐ฝ) is
obtained using adaptive thresholding.
10 Mathematical Problems in Engineering
(a) (b) (c)
Figure 8: Visual comparison of SLIC and superpixel segmentation result, from left to right: input image, SLIC segmentation result, andsuperpixel segmentation result.
Four logistic classifiers are trained to get the parametersof four different combination schemes. We use the learnedparameters to detect the images and also provide the quan-titative comparison of different combination schemes inFigure 7. As can be seen, our approach (shown in Figure 7(b))can get better result when we use more features. Similarconclusions can be obtained from the results of ๐-๐ curvesof different schemes (shown in Figure 7(a)): Scheme 1, fouratomic features (๐ฅ
1โ๐ฅ4), Scheme 2, any combination of
two atomic features and Scheme 1 (๐ฅ1โ๐ฅ10), Scheme 3, any
combination of three atomic features and Scheme 2 (๐ฅ1โ๐ฅ14),
and Scheme 4, any combination of four atomic featuresand Scheme 3 (๐ฅ
1โ๐ฅ15). Weighted combination of atomic
features without considering any combination gets the lowest๐-๐ curves in most of the range. The curve of Scheme 2is very close to Scheme 3 and Scheme 4 when recall isless than 0.9. Scheme 3 and Scheme 4 get almost the samecurve.
3.5. Analysis of the Influencing Factors of Segmentation.Recently, low-level image segmentation methods have beenwidely used for saliency analysis. SLIC [38] and super-pixel [35] approaches are two efficient algorithms and thesource codes are publicly available. Because of consid-ering different segmentation criterion, the segmentationresults are quite different from each other, as is shown inFigure 8. From the figure, we can see that the result ofSLIC method has more local compactness than superpixelmethod. We also provide the visual comparison of foursaliency cues and final saliency map produced by the abovetwo segmentation algorithms. Figure 9 shows that differentsegmentation algorithms can produce different salient cues
and final saliency map.The superpixel approach can generatehigh-quality saliency map, while the SLIC segmentationalgorithm may highlight some nonsalient region. Finally,we provide the quantitative comparison of SLIC and super-pixel segmentation algorithm. To verify the effectiveness oftwo segmentation algorithms, we plot the correspondingprecision-recall curves on the ASD dataset. As observed fromFigure 10, the use of superpixel algorithm can obtain betterprecision-recall curves when compared with SLIC clusteringalgorithm.
4. Conclusion
In this paper, a novel salient region detection approachbased on feature combination and discriminative classifieris presented. We use four saliency cues as atomic feature ofsegmented region in the image. To capture the interactionamong different features, a novel feature vector is generatedby mapping a four-dimensional regional feature to a fifteen-dimensional feature vector. A logistic regression classifier istrained to map a regional feature to a saliency value. Wefurther introduce themultilayer saliencymap integration andsalient center for improvement. We evaluate the proposedapproach on two publicly available datasets and the experi-ments results show that our model can generate high-qualitysaliencymapwhich can uniformly highlight the entire salientobject.
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper.
Mathematical Problems in Engineering 11
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k) (l)
Figure 9: Comparison of salient feature with different segmentation methods. (a) Input image. (b) Ground truth. (c) Saliency map generatedby using SLICmethod. (d) Saliencymap generated by using superpixelmethod. (e) Color contrast based salient feature by using SLICmethod.(f) Color contrast based salient feature by using superpixel method. (g) Color distribution based salient feature by using SLIC method. (h)Color distribution based salient feature by using superpixel method. (i) Objectness based salient feature by using SLICmethod. (j) Objectnessbased salient feature by using superpixelmethod. (k)High prior based salient feature by using SLICmethod. (l)High prior based salient featureby using superpixel method.
12 Mathematical Problems in Engineering
SLIC algorithmSuperpixel algorithm
0 10.10.2
0.20.1 0.3 0.5 0.7 0.9
0.30.4
0.4
0.50.6
0.6
0.70.8
0.8
0.91
Recall
Prec
ision
Figure 10: Comparison of different segmentation methods on theASD dataset.
Acknowledgments
This research is partly supported by the National NaturalScience Foundation of China (no. 61305113, no. 61501394,and no. 61462038), Hebei Province Science and TechnologySupport Program,China (no. 13211801D), andTheSpecializedResearch Foundation for the Doctoral Program of HigherEducation by the Ministry of Education of PR China (no.20131333110015).
References
[1] L. Itti, C. Koch, and E. Niebur, โAmodel of saliency-based visualattention for rapid scene analysis,โ IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 20, no. 11, pp. 1254โ1259,1998.
[2] Y. Zhai and M. Shah, โVisual attention detection in videosequences using spatiotemporal cues,โ in Proceedings of the 14thAnnual ACM International Conference on Multimedia (MMโ06), pp. 815โ824, ACM, October 2006.
[3] X. Hou and L. Zhang, โSaliency detection: a spectral residualapproach,โ in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR โ07),pp. 1โ8, Minneapolis, Minn, USA, June 2007.
[4] J. Harel, C. Koch, and P. Perona, โGraph-based visual saliency,โin Proceedings of the Advances in Neural Information ProcessingSystems (NIPS โ06), pp. 545โ552, Vancouver, Canada, December2006.
[5] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, โFrequen-cy-tuned salient region detection,โ in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR โ09), pp. 1597โ1604, Miami, Fla,USA, June 2009.
[6] M.-M. Cheng, G.-X. Zhang, N. J. Mitra, X. Huang, and S.-M. Hu, โGlobal contrast based salient region detection,โ inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR โ11), pp. 409โ416, IEEE, Providence,RI, USA, June 2011.
[7] H. Jiang, J. Wang, Z. Yuan, T. Liu, N. Zheng, and S. Li, โAuto-matic salient object segmentation based on context and shapeprior,โ in Proceedings of the British Machine Vision Conference(BMVC โ11), p. 7, Dundee, UK, August-September 2011.
[8] X. Shen and Y. Wu, โA unified approach to salient objectdetection via low rank matrix recovery,โ in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR โ12), pp. 853โ860, IEEE, Providence, RI, USA, June 2012.
[9] R. Margolin, A. Tal, and L. Zelnik-Manor, โWhat makes a patchdistinct?โ in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR โ13), pp. 1139โ1146, IEEE,Portland, Ore, USA, June 2013.
[10] Q. Yan, L. Xu, J. Shi, and J. Jia, โHierarchical saliency detection,โin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR โ13), pp. 1155โ1162, IEEE, Portland,Ore, USA, June 2013.
[11] C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang, โSaliencydetection via graph-based manifold ranking,โ in Proceedingsof the 26th IEEE Conference on Computer Vision and PatternRecognition (CVPR โ13), pp. 3166โ3173, Portland,Ore,USA, June2013.
[12] M.-M. Cheng, J. Warrell, W.-Y. Lin, S. Zheng, V. Vineet,and N. Crook, โEfficient salient region detection with softimage abstraction,โ in Proceedings of the 14th IEEE InternationalConference on Computer Vision (ICCV โ13), pp. 1529โ1536,Sydney, Australia, December 2013.
[13] J. Zhang and S. Sclaroff, โSaliency detection: a boolean mapapproach,โ in Proceedings of the 14th IEEE International Con-ference on Computer Vision (ICCV โ13), pp. 153โ160, Sydney,Australia, December 2013.
[14] P. Jiang, H. Ling, J. Yu, and J. Peng, โSalient region detectionby UFO: uniqueness, focusness and objectness,โ in Proceedingsof the 14th IEEE International Conference on Computer Vision(ICCV โ13), pp. 1976โ1983, IEEE, Sydney, Australia, December2013.
[15] B. Jiang, L. Zhang, H. Lu, C. Yang, and M.-H. Yang, โSaliencydetection via absorbingMarkov chain,โ inProceedings of the 14thIEEE International Conference on Computer Vision (ICCV โ13),pp. 1665โ1672, IEEE, Sydney, Australia, December 2013.
[16] J. Kim, D. Han, Y.-W. Tai, and J. Kim, โSalient region detectionvia high-dimensional color transform,โ in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR โ14), pp. 883โ890, IEEE, Columbus, Ohio, USA, June2014.
[17] W. Zhu, S. Liang, Y. Wei, and J. Sun, โSaliency optimizationfrom robust background detection,โ in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRโ14), pp. 2814โ2821, IEEE, Columbus, Ohio, USA, June 2014.
[18] F. Perazzi, P. Krahenbuhl, Y. Pritch, and A. Hornung, โSaliencyfilters: contrast based filtering for salient region detection,โ inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR โ12), pp. 733โ740, IEEE, Providence,RI, USA, June 2012.
[19] T. Liu, Z. Yuan, J. Sun et al., โLearning to detect a salient object,โIEEE Transactions on Pattern Analysis andMachine Intelligence,vol. 33, no. 2, pp. 353โ367, 2011.
[20] S. Goferman, L. Zelnik-Manor, and A. Tal, โContext-awaresaliency detection,โ IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 34, no. 10, pp. 1915โ1926, 2012.
[21] K. Shi, K. Wang, J. Lu, and L. Lin, โPISA: pixelwise imagesaliency by aggregating complementary appearance contrastmeasures with spatial priors,โ in Proceedings of the 26th IEEEConference on Computer Vision and Pattern Recognition (CVPRโ13), pp. 2115โ2122, Portland, Ore, USA, June 2013.
Mathematical Problems in Engineering 13
[22] N. Tong, H. Lu, L. Zhang, and X. Ruan, โSaliency detection withmulti-scale superpixels,โ IEEE Signal Processing Letters, vol. 21,no. 9, pp. 1035โ1039, 2014.
[23] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li, โSalientobject detection: a discriminative regional feature integrationapproach,โ in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR โ13), pp. 2083โ2090, Portland, Ore, USA, June 2013.
[24] G. Sharma, F. Jurie, and C. Schmid, โDiscriminative spatialsaliency for image classification,โ in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRโ12), pp. 3506โ3513, IEEE, Providence, RI, USA, June 2012.
[25] U. Rutishauser, D. Walther, C. Koch, and P. Perona, โIs bottom-up attention useful for object recognition?โ in Proceedings ofthe IEEE Computer Society Conference on Computer Visionand Pattern Recognition (CVPR โ04), vol. 2, pp. II-37โII-44,Washington, DC, USA, June 2004.
[26] H. Wu, Y.-S. Wang, K.-C. Feng, T.-T. Wong, T.-Y. Lee, andP.-A. Heng, โResizing by symmetry-summarization,โ ACMTransactions on Graphics, vol. 29, no. 6, article 159, 2010.
[27] A. M. Treisman and G. Gelade, โA feature-integration theory ofattention,โ Cognitive Psychology, vol. 12, no. 1, pp. 97โ136, 1980.
[28] C. Koch and S. Ullman, โShifts in selective visual attention:towards the underlying neural circuitry,โ Human Neurobiology,vol. 4, no. 4, pp. 219โ227, 1985.
[29] Y.-F. Ma and H.-J. Zhang, โContrast-based image attentionanalysis by using fuzzy growing,โ in Proceedings of the 11thAnnual ACM International Conference on Multimedia (MMโ03), pp. 374โ381, Berkeley, Calif, USA, November 2003.
[30] T. Judd, K. Ehinger, F. Durand, and A. Torralba, โLearningto predict where humans look,โ in Proceedings of the 12thInternational Conference on Computer Vision (ICCV โ09), pp.2106โ2113, 2009.
[31] Y. Lu, W. Zhang, H. Lu, and X. Xue, โSalient object detectionusing concavity context,โ in Proceedings of the IEEE Interna-tional Conference on Computer Vision (ICCV โ11), pp. 233โ240,Barcelona, Spain, November 2011.
[32] Y. Wei, F. Wen, W. Zhu, and J. Sun, โGeodesic saliency usingbackground priors,โ in Proceedings of the 12th European Confer-ence on Computer Vision (ECCV โ12), pp. 29โ42, Florence, Italy,October 2012.
[33] R. Liu, J. Cao, Z. Lin, and S. Shan, โAdaptive partial differentialequation learning for visual saliency detection,โ in Proceedingsof the IEEE Conference on Computer Vision and Pattern Recog-nition (CVPR โ14), pp. 3866โ3873, Columbus, Ohio, USA, June2014.
[34] X. Li, H. Lu, L. Zhang, X. Ruan, and M.-H. Yang, โSaliencydetection via dense and sparse reconstruction,โ in Proceedingsof the 14th IEEE International Conference on Computer Vision(ICCV โ13), pp. 2976โ2983, Sydney, Australia, December 2013.
[35] P. F. Felzenszwalb andD. P.Huttenlocher, โEfficient graph-basedimage segmentation,โ International Journal of Computer Vision,vol. 59, no. 2, pp. 167โ181, 2004.
[36] C. M. Bishop, Pattern Recognition and Machine Learning,Springer, 2006.
[37] J. D. W. Hosmer and S. Lemeshow, Applied Logistic Regression,John Wiley & Sons, 2004.
[38] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Suฬss-trunk, โSLIC superpixels compared to state-of-the-art super-pixel methods,โ IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 34, no. 11, pp. 2274โ2281, 2012.
Submit your manuscripts athttp://www.hindawi.com
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttp://www.hindawi.com
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
CombinatoricsHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
International Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com
Volume 2014 Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Stochastic AnalysisInternational Journal of