4
Cosegmentation from Similar Backgrounds Fanman Meng, Hongliang Li School of Electronic Engineering University of Electronic Science and Technology of China, Chengdu, China, Email: [email protected] [email protected] KingNgi Ngan Department of Electronic Engineering, The Chinese University of Hong Kong ShaTin, Hong Kong Email: [email protected] Bing Zeng, Nini Rao School of Electronic Engineering University of Electronic Science and Technology of China, Chengdu, China, Email: [email protected] [email protected] Abstract—Recently, the common objects are often required to be extracted from a group of images in many applications, such as video coding and model training. Co-segmentation is a new and efficient method for this requirement. In realistic applications, we observe that the images usually contain similar backgrounds (namely similar scene co-segmentation), such as the city landmark images collected from the web or the key frames sampled from a video. Meanwhile, the existing co-segmentation has not paid so much attention on the similar scene co-segmentation, and the insufficiently accurate segments may be provided by the existing methods. In this paper, we propose an active contours based co-segmentation model to provide foregrounds from the similar backgrounds. We combine the background consistency constraint with the foreground consistency constraint to form the energy function, and use the method of level-set and the calculus of variations to minimize the model. We also speed up the model by the hierarchical structure and the superpixel technique. We test the method on both the image and video dataset. The results show that the proposed model can obtain larger IOU values than the state-of-the-art co-segmentation methods. I. I NTRODUCTION Many image and video applications need to extract the common objects from a group of images, such as video coding and image classification. In these applications, a group of images containing an interesting object are first collected from the web or a video. Then, the image segmentation is performed on the image group to locate the object regions. Co-segmentation is a new and efficient method to solve this problem, which intends to extract the regions contained by all images as the common objects. In the past few years, many co-segmentation model- s [1] have been proposed, such as MRF co-segmentation [2], active contours co-segmentation [3], clustering based co- segmentation [4], [5], heat diffusion co-segmentation [6], [7] and random walker co-segmentation [8]. The classical co- segmentation model usually introduces the foreground consis- tency constraint into the segmentation model, and forms the following energy function: = + (,) ( , ) (1) where is the energy of the current -th image, is the single image segmentation term (Single Term), which describes the segment smoothness and the distinctions between the foreground and the background in each singe image, (,) ( , ) is the multiple foreground consistency term, which is to capture the foreground similarities among the multiple images (Multiple Term). Assuming the images come from different backgrounds, only the common objects contain the similar regions. The segmentation of labelling the common object as the foreground will result in the minimum value of (1). The co-segmentation thus can be achieved by minimizing the energy in (1). However, in many applications, the realistic image groups usually come from the same scene, such as the images taken from a meeting room, a scenic spot or a video clip. Note that these image groups contain similar backgrounds, which makes the assumption in the model of (1) invalid and results in the unsuccessful co-segmentation results. Introducing the foreground prior in the co-segmentation model [9]–[14] is an efficient method to solve the similar scene co-segmentation. In this method, the prior generated by either user scribble or the saliency map can be used to distinguish the foreground from the similar background. Since the interferences of the similar backgrounds, designing the foreground prior from the similar backgrounds without manually setting is still challenging. Moreover, this method lacks the consideration of the specific characteristics of the similar scene co-segmentation that the backgrounds are similar across the images. Note that by assuming that the most regions near the image edge are the backgrounds, there are few interferences of the foreground regions in the molding of the background information, which means that desiring the background model from the image group is relatively easier than the foreground model. Meanwhile, the foregrounds can also be extracted by the background information instead of the foreground information. In this paper, we propose a new similar scene co- segmentation method by simultaneously considering the fore- ground consistency and the background consistency. We also use a more weak and simple object prior, i.e., a fixed window, to provide the initial object information of each image. Our model consists of three aspects, such as the multiple fore- ground consistency, the multiple background consistency and single image segmentation. We combine the three constraints into the active contours model, and propose a new model to cosegment common objects from the same background scene. We also consider the superpixels technique and hierarchical segmentation to accelerate the proposed method. The exper- imental results demonstrate that the proposed method can improve the cosegmentation accuracy compared with several stats-of-the-art co-segmentation methods. 978-1-4799-3432-4/14/$31.00 ©2014 IEEE 353

[IEEE 2014 IEEE International Symposium on Circuits and Systems (ISCAS) - Melbourne VIC, Australia (2014.6.1-2014.6.5)] 2014 IEEE International Symposium on Circuits and Systems (ISCAS)

  • Upload
    nini

  • View
    218

  • Download
    6

Embed Size (px)

Citation preview

Page 1: [IEEE 2014 IEEE International Symposium on Circuits and Systems (ISCAS) - Melbourne VIC, Australia (2014.6.1-2014.6.5)] 2014 IEEE International Symposium on Circuits and Systems (ISCAS)

Cosegmentation from Similar Backgrounds

Fanman Meng, Hongliang LiSchool of Electronic EngineeringUniversity of Electronic Science

and Technology of China,Chengdu, China,

Email: [email protected]@uestc.edu.cn

KingNgi NganDepartment of Electronic Engineering,The Chinese University of Hong Kong

ShaTin, Hong KongEmail: [email protected]

Bing Zeng, Nini RaoSchool of Electronic EngineeringUniversity of Electronic Science

and Technology of China,Chengdu, China,

Email: [email protected]@uestc.edu.cn

Abstract—Recently, the common objects are often required tobe extracted from a group of images in many applications, such asvideo coding and model training. Co-segmentation is a new andefficient method for this requirement. In realistic applications,we observe that the images usually contain similar backgrounds(namely similar scene co-segmentation), such as the city landmarkimages collected from the web or the key frames sampled froma video. Meanwhile, the existing co-segmentation has not paidso much attention on the similar scene co-segmentation, and theinsufficiently accurate segments may be provided by the existingmethods. In this paper, we propose an active contours basedco-segmentation model to provide foregrounds from the similarbackgrounds. We combine the background consistency constraintwith the foreground consistency constraint to form the energyfunction, and use the method of level-set and the calculus ofvariations to minimize the model. We also speed up the modelby the hierarchical structure and the superpixel technique. Wetest the method on both the image and video dataset. The resultsshow that the proposed model can obtain larger IOU values thanthe state-of-the-art co-segmentation methods.

I. INTRODUCTION

Many image and video applications need to extract thecommon objects from a group of images, such as videocoding and image classification. In these applications, a groupof images containing an interesting object are first collectedfrom the web or a video. Then, the image segmentation isperformed on the image group to locate the object regions.Co-segmentation is a new and efficient method to solve thisproblem, which intends to extract the regions contained by allimages as the common objects.

In the past few years, many co-segmentation model-s [1] have been proposed, such as MRF co-segmentation[2], active contours co-segmentation [3], clustering based co-segmentation [4], [5], heat diffusion co-segmentation [6], [7]and random walker co-segmentation [8]. The classical co-segmentation model usually introduces the foreground consis-tency constraint into the segmentation model, and forms thefollowing energy function:

𝐸𝑖 = 𝐸𝑠𝑖 +

(𝑖,𝑗)

𝑑(𝐹𝑖, 𝐹𝑗) (1)

where 𝐸𝑖 is the energy of the current 𝑖-th image, 𝐸𝑠𝑖 is

the single image segmentation term (Single Term), whichdescribes the segment smoothness and the distinctions betweenthe foreground and the background in each singe image,∑

(𝑖,𝑗) 𝑑(𝐹𝑖, 𝐹𝑗) is the multiple foreground consistency term,

which is to capture the foreground similarities among themultiple images (Multiple Term). Assuming the images comefrom different backgrounds, only the common objects containthe similar regions. The segmentation of labelling the commonobject as the foreground will result in the minimum value of(1). The co-segmentation thus can be achieved by minimizingthe energy in (1). However, in many applications, the realisticimage groups usually come from the same scene, such asthe images taken from a meeting room, a scenic spot ora video clip. Note that these image groups contain similarbackgrounds, which makes the assumption in the model of (1)invalid and results in the unsuccessful co-segmentation results.

Introducing the foreground prior in the co-segmentationmodel [9]–[14] is an efficient method to solve the similarscene co-segmentation. In this method, the prior generatedby either user scribble or the saliency map can be usedto distinguish the foreground from the similar background.Since the interferences of the similar backgrounds, designingthe foreground prior from the similar backgrounds withoutmanually setting is still challenging. Moreover, this methodlacks the consideration of the specific characteristics of thesimilar scene co-segmentation that the backgrounds are similaracross the images. Note that by assuming that the mostregions near the image edge are the backgrounds, there arefew interferences of the foreground regions in the moldingof the background information, which means that desiring thebackground model from the image group is relatively easierthan the foreground model. Meanwhile, the foregrounds canalso be extracted by the background information instead ofthe foreground information.

In this paper, we propose a new similar scene co-segmentation method by simultaneously considering the fore-ground consistency and the background consistency. We alsouse a more weak and simple object prior, i.e., a fixed window,to provide the initial object information of each image. Ourmodel consists of three aspects, such as the multiple fore-ground consistency, the multiple background consistency andsingle image segmentation. We combine the three constraintsinto the active contours model, and propose a new model tocosegment common objects from the same background scene.We also consider the superpixels technique and hierarchicalsegmentation to accelerate the proposed method. The exper-imental results demonstrate that the proposed method canimprove the cosegmentation accuracy compared with severalstats-of-the-art co-segmentation methods.

978-1-4799-3432-4/14/$31.00 ©2014 IEEE 353

Page 2: [IEEE 2014 IEEE International Symposium on Circuits and Systems (ISCAS) - Melbourne VIC, Australia (2014.6.1-2014.6.5)] 2014 IEEE International Symposium on Circuits and Systems (ISCAS)

II. THE PROPOSED METHOD

A. The Basic Model

We intend to segment a set of common objectsℱ = {𝐹1, 𝐹2, ⋅ ⋅ ⋅ , 𝐹𝑛} from a group of images ℐ ={𝐼1, 𝐼2, ⋅ ⋅ ⋅ , 𝐼𝑛}. The images contain the similar backgroundsdenoted as ℬ = {𝐵1, 𝐵2, ⋅ ⋅ ⋅ , 𝐵𝑛}. Meanwhile, we startthe co-segmentation from a set of fixed initial curve 𝒞 ={𝐶1, 𝐶2, ⋅ ⋅ ⋅ , 𝐶𝑛}, where the regions inside and outside thecurve are used as the initial foreground and background. Inour model, the initial curve 𝐶𝑖 of a given image 𝐼𝑖 is set as arectangle with distance 𝛿 to the image edge.

Based on the initial curve 𝒞, our model extracts the com-mon object by searching the curve 𝐶∗

𝑖 in 𝐼𝑖 that exactly locateson the object boundary. Given any curve 𝐶𝑖 in 𝐼𝑖, we evaluatethe curve by the foreground and background consistencies,which are scored by fitness energy 𝐸𝐶𝑖

. Assuming that a goodcurve prefers to a small value 𝐸𝐶𝑖

. Then, we achieve the co-segmentation by searing 𝐶∗

𝑖 that corresponds to the maximumevaluation value 𝐸𝐶𝑖

, i.e.,

𝐶∗𝑖 = argmin

𝐶𝑖

𝐸𝐶𝑖, 𝑖 = 1, ⋅ ⋅ ⋅ , 𝑛 (2)

The key of the model is the efficient evaluation of the consis-tencies. Here, we evaluate the curve 𝐸𝐶𝑖

by two basic terms:the multiple foreground consistency term 𝐸𝑖

𝑓 and multiplebackground consistency term 𝐸𝑖

𝑏. The first one is to evaluatethe consistencies between the current interior region and otherforeground regions. It guarantees that a good curve 𝐶𝑖 shouldexactly cover the common interesting objects. The second onemeasures the consistencies between the exterior regions of theimage group. A good curve will contain the exterior regionthat is consistent to other backgrounds. Apart from multipleimage term, we also consider the single image term 𝐸𝑖

𝑠 todescribe the distinctions of the foreground and background ineach single image segment. A good curve ought to divide theimage into two distinctly different regions. Hence, the basiccomponents of the energy 𝐸𝑖 can be represented by:

𝐸𝐶𝑖= 𝜆𝑓𝐸

𝑖𝑓 + 𝜆𝑏𝐸

𝑖𝑏 + 𝜆𝑠𝐸

𝑖𝑠 (3)

where 𝜆𝑓 , 𝜆𝑏 and 𝜆𝑠 are the scale parameters to balance eachterm. We next detailed illustrate the three terms as shown in(3).

1) The Multiple Foreground Consistency Term: Given anycurve 𝐶𝑖, the image 𝐼𝑖 can be divided into two regions: theregions inside and outside the curve (denoted as 𝜔𝑖

1 and 𝜔𝑖0),

which describe the foreground and the background regions.Moreover, each region 𝜔 is represented by a feature descriptorvector 𝑔(𝜔), such as a color histogram.

The multiple foreground consistency term 𝐸𝑖𝑓 aims to

measure the similarities between the interior region 𝜔𝑖1 and

other foreground regions (𝜔𝑗1, 𝑗 ∕= 𝑖), which is based on the

measurement 𝑆(𝜔, 𝜔′) of the consistency between any pair ofregions (𝜔, 𝜔′) from the image pair (𝐼, 𝐼 ′) (𝜔 ∈ 𝐼 and 𝜔 ∈ 𝐼 ′).In our model, we define 𝑆(𝜔, 𝜔′) as

𝑆(𝜔, 𝜔′) =ˆ𝐼(𝑥,𝑦)∈𝜔

𝑓(𝐼(𝑥, 𝑦), 𝑔(𝜔′))𝑑𝑥𝑑𝑦 (4)

where 𝐼(𝑥, 𝑦) is a pixel with position (𝑥, 𝑦) in image 𝐼 ,𝐼(𝑥, 𝑦) ∈ 𝜔 indicates a pixel in 𝜔, 𝑔(𝜔′) is the feature vector

of region 𝜔′, and 𝑓(𝐼(𝑥, 𝑦), 𝑔(𝜔′)) denotes the consistencybetween the pixel 𝐼(𝑥, 𝑦) and the region 𝜔′ represented by theregion feature 𝑔(𝜔′) (Pixel to Region Similarity). The finalconsistency between 𝜔 and 𝜔′ is measured by the integral ofthe consistencies between all the pixels inside 𝜔 to the region𝜔′ (Region to Region Similarity).

Based on the pairwise region consistency as shown in (4),we define 𝐸𝑖

𝑓 for a curve 𝐶𝑖 in 𝐼𝑖 by summing the similaritiesbetween the interior region 𝜔𝑖

1 and the other foreground region𝜔𝑗1, 𝑗 = 1, ⋅ ⋅ ⋅ , 𝑛, 𝑗 ∕= 𝑖, which can be represented as

𝐸𝑖𝑓 =

𝑛∑

𝑗=1,𝑗 ∕=𝑖

𝑆(𝜔𝑖1, 𝜔

𝑗1)

=

𝑛∑

𝑗=1,𝑗 ∕=𝑖

ˆ𝐼𝑖(𝑥,𝑦)∈𝜔𝑖

1

𝑓(𝐼𝑖(𝑥, 𝑦), 𝑔(𝜔𝑗1))𝑑𝑥𝑑𝑦 (5)

We can see that 𝐸𝑖𝑓 will be assigned a large value when 𝜔𝑗

1is consistent to most of other foregrounds. Otherwise, a smallvalue of 𝐸𝑖

𝑓 will be obtained for a curve with the inconsistentinterior region.

2) The Multiple Background Consistency term: The mul-tiple background consistency term is to introduce the back-ground priors into current image segmentation to provide morebackground information. Our idea is that the similar regionsamong the exterior regions should be rewarded as backgrounds.In other words, the exterior region 𝜔𝑖

0 that is consistent tothe other exterior regions should be rewarded with a largevalue. Otherwise, a small value is assigned to the inconsistentexterior region as penalization. Here, we define the multiplebackground similarity term by the consistency measurementbetween 𝜔𝑖

0 and 𝜔𝑗0, 𝑖 ∕= 𝑗 (as shown in (4)), which can be

represented as

𝐸𝑖𝑏 =

𝑛∑

𝑗=1,𝑗 ∕=𝑖

𝑆(𝜔𝑖0, 𝜔

𝑗0)

=

𝑛∑

𝑗=1,𝑗 ∕=𝑖

ˆ𝐼𝑖(𝑥,𝑦)∈𝜔𝑖

0

𝑓(𝐼𝑖(𝑥, 𝑦), 𝑔(𝜔𝑗0))𝑑𝑥𝑑𝑦 (6)

where 𝜔𝑗0 denotes the region outside 𝐶𝑗 in image 𝐼𝑗 . It is

seen that an exterior region consistent to the other backgroundswill be rewarded to be backgrounds. Otherwise, labelling theinconsistent exterior region to be background tends to bepenalized.

3) The Single Image Segmentation Term: The above t-wo terms are based on multiple images. The single imagesegmentation only focuses on current image 𝐼𝑖, and intendsto make the region inside the curve significantly differentfrom the initial background prior. Meanwhile, the backgroundsshould be consistent with the initial background prior. Here,we formulate the single image segmentation term based on theconsistency between the pixels 𝐼𝑖(𝑥, 𝑦) in 𝜔𝑖

0 and the exteriorregion 𝜔𝑖

0 itself, which is defined as

𝐸𝑖𝑠 =

ˆ𝐼𝑖(𝑥,𝑦)∈𝜔𝑖

0

𝑓(𝐼𝑖(𝑥, 𝑦), 𝑔(𝜔𝑖0))𝑑𝑥𝑑𝑦 (7)

We can see that the formula (7) will have a large value of𝐸𝑖

𝑠 only if the pixels that are consistent to the initial exterior

354

Page 3: [IEEE 2014 IEEE International Symposium on Circuits and Systems (ISCAS) - Melbourne VIC, Australia (2014.6.1-2014.6.5)] 2014 IEEE International Symposium on Circuits and Systems (ISCAS)

region are labelled as background. Otherwise, the inconsistentpixels are classified as the foreground.

4) The Final Energy: Based on (5), (6) and (7), the finalenergy function as shown in (3) is defined as

𝐸𝐶𝑖= 𝜇 ⋅𝐴𝑟𝑒𝑎(𝜔𝑖

1) + 𝜈 ⋅ 𝐿𝑒𝑛𝑔𝑡ℎ(𝐶𝑖)

−𝜆𝑓

𝑛∑

𝑗=1

ˆ𝐼𝑖(𝑥,𝑦)∈𝜔𝑖

1

𝑓(𝐼𝑖(𝑥, 𝑦), 𝑔(𝜔𝑗1))𝑑𝑥𝑑𝑦

−𝜆𝑏

𝑛∑

𝑗=1,𝑗 ∕=𝑖

ˆ𝐼𝑖(𝑥,𝑦)∈𝜔𝑖

0

𝑓(𝐼𝑖(𝑥, 𝑦), 𝑔(𝜔𝑗0))𝑑𝑥𝑑𝑦 (8)

−𝜆𝑠

ˆ𝐼𝑖(𝑥,𝑦)∈𝜔𝑖

0

𝑓(𝐼𝑖(𝑥, 𝑦), 𝑔(𝜔𝑖0))𝑑𝑥𝑑𝑦

Here, we add two terms 𝐴𝑟𝑒𝑎(𝜔𝑖1) and 𝐿𝑒𝑛𝑔𝑡ℎ(𝐶𝑖), which are

the area of the region inside 𝐶𝑖 and the length of the curve𝐶𝑖. They are the intrinsic properties as used in [3], [15].

B. The Optimization

Each image 𝐼𝑖, 𝑖 = 1, ⋅ ⋅ ⋅ , 𝑛 is related to a minimizationof 𝐸𝐶𝑖

. Hence, there are a set of 𝑛 minimization problems.In this paper, we simultaneously minimize these energies bydynamically updating the foreground and background prior inan iteration process. In each iteration step, we evolve eachcurve 𝐶𝑖 by treating the segments of the other images asfixed regions. Then, the new curve will be used as the newforeground and background prior for the next iteration untilthe convergence.

To minimize the energy in (8), we first use level setfunction 𝜙𝑖 to represent the curve and rewrite the energy in(8) as 𝐸𝑖(𝜙𝑖). The new representation is related to functions𝛿(𝑥) (the one-dimensional Dirac function) and 𝐻(𝑥) (theHeaviside function). The curve 𝐶𝑖 and the regions inside andoutside the curve (𝜔𝑖

0 and 𝜔𝑖1) are represented by 𝛿(𝜙𝑖(𝑥, 𝑦)),

𝐻(𝜙𝑖(𝑥, 𝑦)) and (1−𝐻(𝜙𝑖(𝑥, 𝑦)), respectively. The searchingof 𝐶𝑖 changes to obtain 𝜙𝑖 with the minimum value of 𝐸𝑖(𝜙𝑖).

To minimize 𝐸𝑖(𝜙𝑖), we use the strategy in [3] that weset 𝑔(𝜔𝑗

1), 𝑔(𝜔𝑗0) and 𝑔(𝜔𝑖

0) fixed to make 𝑓(𝐼𝑖(𝑥, 𝑦), 𝑔(𝜔𝑗1)),

𝑓(𝐼𝑖(𝑥, 𝑦), 𝑔(𝜔𝑗0)) and 𝑓(𝐼𝑖(𝑥, 𝑦), 𝑔(𝜔

𝑖0)) independent to 𝜙𝑖.

Then, the Euler-Lagrange equation of 𝐸𝑖(𝜙𝑖) is represented as

∂𝜙𝑖

∂𝑡= 𝛿(𝜙𝑖)(−𝜇+ 𝜈⋅div( ∇𝜙𝑖

∣∇𝜙𝑖∣ )

+𝜆𝑓

𝑛∑

𝑗=1

𝑓(𝐼𝑖, 𝑔(𝜔𝑗1))− 𝜆𝑏

𝑛∑

𝑗=1,𝑗 ∕=𝑖

𝑓(𝐼𝑖, 𝑔(𝜔𝑗0))

−𝜆𝑠𝑓(𝐼𝑖, 𝑔(𝜔𝑖0))) (9)

where 𝑡 ≧ 0 is an artificial time in 𝜙𝑖(𝑡, 𝑥, 𝑦). Setting 𝜅 =div(∇𝜙𝑖/∣∇𝜙𝑖∣) to be the curvature of 𝐶𝑖 and △𝑡 = 1, thediscrete evolving form of (9) to iteratively update 𝜙𝑖 can berepresented as

𝜙𝑁+1𝑖 (𝑥, 𝑦) = 𝜙𝑁

𝑖 (𝑥, 𝑦) + 𝛿(𝜙𝑖)(−𝜇− 𝜈 ⋅ 𝜅(𝑥, 𝑦)

+𝜆𝑓

𝑛∑

𝑗=1

𝑓(𝐼𝑖(𝑥, 𝑦), 𝑔(𝜔𝑗1))− 𝜆𝑏

𝑛∑

𝑗=1,𝑗 ∕=𝑖

𝑓(𝐼𝑖(𝑥, 𝑦), 𝑔(𝜔𝑗0))

−𝜆𝑠𝑓(𝐼𝑖(𝑥, 𝑦), 𝑔(𝜔𝑖0))) (10)

C. Algorithm Speedup

We next speedup the algorithm by two techniques, i.e.,the hierarchy curve evolution method and the superpixel basedcurve evolution method. The hierarchy curve evolution methodis based on the observation that the curve evolution of animage can be kept when performing the evolution on a smallscale of the image. Motivated by such observation, we speedupthe curve evolution by using a hierarchical structure of theimages, which starts from small size of images with the quickconvergency to the object boundary. Then, we propagate thesegmentation from the small size image to the large size image.In the superpixel based curve evolution method, we focus onsuperpixel instead of pixel, which can dramatically reduce thenumber of pixels and result in the low computational cost.In our method, we use SLIC superpixel generation method in[16] by setting the superpixels number 1000 for each imageand hierarchy.

III. THE EXPERIMENTAL RESULTS

In this section, we verify the proposed method on threedatasets, such as 17 category flower dataset and ICoseg datasetfor image datasets and UCF sports action dataset for videodataset.

A. Parameter Setting

In the initial curve setting, we set 𝛿 = 0.2 ⋅𝑊 , where 𝑊is the minimum value of the width and height of the image.In (8), we set 𝜇 = 0.01, 𝜈 = 0.001, 𝜆𝑏 = 3 and 𝜆𝑠 = 1. Weonly adjust the parameters 𝜆𝑓 to balance the interior forceand the exterior force for different image groups with therange of [4, 7]. In our method, we use the method in [3] tomeasure the similarity between a pixel 𝑝 and a region 𝜔, Inthe hierarchical structure, we use two hierarchical layers withthe scale 𝜓𝑘 ∈ {0.5, 1}, respectively. The iteration stops bythe iteration number. We set the iteration number to be 100and 30 for 𝜓1 = 0.5 and 𝜓2 = 1, respectively. We also usea post-processing to normalize the obtained results, since theresults may contain several very small foregrounds [3]. We usegraph-cut algorithm as the post-processing. In our experiments,we display both the original and the normalized results, whichare denoted by Ours and Ours-G, respectively.

B. Our Co-segmentation Results

We first show the results on the image groups. The resultson image dataset (ICoseg dataset) and video dataset (UCFdataset) are shown in Fig. 1 and Fig. 2, respectively. We cansee that the original images contain similar backgrounds forimages and videos, such as the indoor stadium backgroundin the group Diving in UCF dataset. It is seen that theproposed method can extract the foreground from the similarscene, which mainly benefits from the background consistencyconstraint.

We next verify the proposed method by objective evalua-tion. The intersection-over-union (IOU) ratio is used for the e-valuation metric, which is defined as the ratio of the area of theintersection region between the segment and the groundtruth tothe area of their union region ( 𝑠𝑒𝑔𝑚𝑒𝑛𝑡∩𝑔𝑟𝑜𝑢𝑛𝑑−𝑡𝑟𝑢𝑡ℎ

𝑠𝑒𝑔𝑚𝑒𝑛𝑡∪𝑔𝑟𝑜𝑢𝑛𝑑−𝑡𝑟𝑢𝑡ℎ ). Given agroup of images, the average IOU values of all images are used

355

Page 4: [IEEE 2014 IEEE International Symposium on Circuits and Systems (ISCAS) - Melbourne VIC, Australia (2014.6.1-2014.6.5)] 2014 IEEE International Symposium on Circuits and Systems (ISCAS)

Fig. 1. The cosegmentation results by the proposed method on ICoseg dataset.

Fig. 2. The cosegmentation results of the proposed method on UCF SportsAction Dataset.

as the objective result of the image group. A good cosegmenta-tion result corresponds to a large IOU value. Otherwise, smallIOU value will be obtained for bad cosegmentaiton results. TheIOU values of the proposed method and several state-of-the-artco-segmentation methods on flower dataset, UCF dataset andthe ICoseg dataset are shown in Table I, II and III, respectively.We can see that the proposed method achieves larger IOUvalues compared with existing co-segmentation methods.

IV. CONCLUSION

In this paper, we propose a co-segmentation model toextract the common objects from a same scene. There segmen-tation terms such as multiple foreground consistency, multiplebackground consistency and single image segmentation arecombined in the active contours framework to achieve betterco-segmentation performance. The method is verified on bothimage and video datasets. The experimental results show theimprovement of the co-segmentation on these datasets in termsof IOU values.

ACKNOWLEDGMENT

This work was partially supported by NSFC (No.61271289), National High Technology Research and Develop-ment Program of China (863Program, No. 2012AA011503),The Ph.D. Programs Foundation of Ministry of Educationof China (No. 20110185110002) and Fundamental ResearchFunds for the Central Universities (N0. E022050205).

TABLE I. THE IOU RESULTS ON 17 CATEGORY FLOWER DATASET

DATASET.

The method [4] [5] [6] [7] Ours Ours+GIOU value 0.2176 0.2321 0.4309 0.2265 0.6355 0.6867

TABLE II. THE IOU RESULTS ON UCF DATASET.

The method [4] [5] [6] [7] Ours Ours+GIOU value 0.1264 0.1072 0.1916 0.1558 0.3701 0.4201

TABLE III. THE IOU RESULTS ON ICOSEG DATASET.

The method [4] [5] [6] [7] Ours Ours+GIOU value 0.463 0.46 0.406 0.363 0.681 0.710

REFERENCES

[1] Fanman Meng, Hongliang Li, King Ngi Ngan, Liaoyuan Zeng, andQingbo Wu. Feature adaptive co-segmentation by complexity aware-ness. IEEE Transactions on Image Processing, 22(12):4809–4824, Dec.2013.

[2] Carsten Rother, Vladimir Kolmogorov, Tom Minka, and Andrew Blake.Cosegmentation of image pairs by histogram matching-incorporating aglobal constraint into mrfs. In IEEE Conference on Computer Visionand Pattern Recognition, pages 993–1000, New York, USA, June 2006.

[3] Fanman Meng, Hongliang Li, Guanghui Liu, and King Ngi Ngan.Image cosegmentation by incorporating color reward strategy and activecontour model. IEEE Transactions on Cybernetics, 43(2):725–737,April 2013.

[4] Armand Joulin, Francis Bach, and Jean Ponce. Discriminative clusteringfor image co-segmentation. In IEEE Conference on Computer Visionand Pattern Recognition, pages 1943–1950, San Francisco, CA, June2010.

[5] Armand Joulin, Francis Bach, and Jean Ponce. Multi-class cosegmenta-tion. In IEEE Conference on Computer Vision and Pattern Recognition,pages 542 – 549, Providence, RI, June 2012.

[6] Gunhee Kim, Eric P. Xing, Li Fei-Fei, and Takeo Kanade. Distributedcosegmentation via submodular optimization on anisotropic diffusion.In International Conference on Computer Vision, pages 169 – 176,Barcelona, Nov. 2011.

[7] Gunhee Kim and Eric P. Xing. On multiple foreground cosegmentation.In IEEE Conference on Computer Vision and Pattern Recognition, pages837 – 844, Providence, RI, June 2012.

[8] Maxwell Collins, Jia Xu, Leo Grady, and Vikas Singh. Randomwalks for multi-image cosegmentation: Quasiconvexity results and gpu-based solutions. In IEEE Conference on Computer Vision and PatternRecognition, pages 1656 – 1663, Providence, RI, June 2012.

[9] Fanman Meng, Hongliang Li, Guanghui Liu, and King Ngi Ngan.Object co-segmentation based on shortest path algorithm and saliencymodel. IEEE Transactions on Multimedia, 14(5):1429 –1441, Oct. 2012.

[10] Hongliang Li, Fanman Meng, and King Ngi Ngan. Co-salient objectdetection from multiple images. IEEE Transactions on Multimedia,15(8):1896 – 1909, Dec. 2013.

[11] KaiYueh Chang, TyngLuh Liu, and ShangHong Lai. From co-saliencyto co-segmentation: An efficient and fully unsupervised energy mini-mization model. In IEEE Conference on Computer Vision and PatternRecognition, pages 2129 – 2136, Providence, RI, June 2011.

[12] Hongliang Li and King N. Ngan. Saliency model based face seg-mentation in head-and-shoulder video sequences. Journal of VisualCommunication and Image Representation, 19(5):320–333, 2008.

[13] Jose Rubio, Joan Serrat, Antonio L𝑜pez, and Nikos Paragios. Unsuper-vised co-segmentation through region matching. In IEEE Conference onComputer Vision and Pattern Recognition, pages 749 – 756, Providence,RI, June 2012.

[14] Hongliang Li and King Ngi Ngan. A co-saliency model of image pairs.IEEE Transactions on Image Processing, 20(12):3365 –3375, dec. 2011.

[15] T. F. Chan and L. A. Vese. Active contours without edges. IEEETransaction on Image Processing, 10(2):266–277, 2001.

[16] Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi,Pascal Fua, and Sabine Sustrunk. Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysisand Machine Intelligence, 34(11):2274–2282, May 2012.

356