SEGMENTATION METHODS FOR MULTIPLE BODY PARTS

SEGMENTATION METHODS

FORMULTIPLE BODY PARTS

PROJECT IN LIEU OF THESISPresented for the

Master of Science DegreeThe University of Tennessee, Knoxville

Sitapa RujikietgumjornJuly 31, 2008

ACKNOWLEDGMENTFirst of all, I would like to thank Dr. Mongi Abidi, my advisor, who gave me an oppor-

tunity to work on this project which allow me to gain much research experience in this field.His valuable advice and guidance has always been very helpful. Secondly, I would like tothank Mr. Sreenivas Rangan, a Ph.D. student at the IRIS lab, for his guidance and supporttoward this project. Also, with special thanks to Mr. Justin Acuff for his help with capturingthe data images. Lastly, I would like to thank Dr. Andreas Koschan and Dr. Seddik Djouadifor being part of my graduate committee.

ABSTRACTThis project is a review of segmentation methods for multiple body parts. Segmentation

has played an important role in computer vision especially for human tracking. There aredemands for extracting the human body from an image. Therefore, it will even be moreuseful to be able to extract specific body parts like head, arms, or legs. These segmentedoutputs can be further applied to other application such as part tracking or gesture analysis.Multiple methods were tested and reviewed for segmentation of a human body in variousdistances. The project started with a literature review on segmentation methods. Then, anexperiment on segmentation was developed using MATLAB. The goal of this experimentwas to determine the pros and cons of each method and find a suitable method for multiplebody parts segmentation.

ii

Contents1 INTRODUCTION 1

1.1 Motivations and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 IMAGE ACQUISITIONS 3

3 THEORIES AND METHODS FOR SEGMENTATION 83.1 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.3 Gray-Scale Morphological Method . . . . . . . . . . . . . . . . . . . . . . . 93.4 Watershed Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.5 Color-Based Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.6 K-mean Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.7 Normalized Cut Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 113.8 Motion Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 EXPERIMENTAL RESULTS AND DISCUSSIONS 134.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.3 Segmentation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.4 Comparison of Segmentation Methods . . . . . . . . . . . . . . . . . . . . . 234.5 Issues and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5 CONCLUSIONS 36

6 FUTURE WORKS 37

7 APPENDIX 38

REFERENCES 40

iii

List of Tables4.1 Comparison of Segmentation Methods . . . . . . . . . . . . . . . . . . . . . 284.2 MATLAB processing time of Normalized Cut segmentation and K-mean

clustering for 10 and 20 segments . . . . . . . . . . . . . . . . . . . . . . . 33

iv

List of Figures1.1 Segmenting human body into excessive segmented body parts . . . . . . . . 11.2 Application: Gesture analysis developed by Chen Wu [1] (image from [1]) . . 22.1 Location for image sequences . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Images taken in various distances. The longest distance is approximately 110

yards and the shortest distance is approximately 5 yards . . . . . . . . . . . 32.3 Equipment used is Canon EOS 40D camera and Canon EF 600mm f/4L lens . 42.4 21 image sequences in distance from 105 yards down to 5 yards for data set 1 52.5 21 image sequences in distance from 105 yards down to 5 yards for data set 2 62.6 21 image sequences in distance from 105 yards down to 5 yards for data set 3 73.1 Segmentation example: (a) Color-based segmentation result by Lucchese

et.al. [2] (b) Shape-based segmentation result by Tae-O-Sot et.al. [3] . . . . . 84.1 (a) Original image (b) Gray scale image (c) Histogram . . . . . . . . . . . . 144.2 Single thresholding with threshold value equals to 80 (a) Gray scale image

(b)Threshold image (c) Apply Morphological operation to remove noise andunwanted segments (d) Masking the result with the original image . . . . . . 15

4.3 Adaptive thresholding (a) Gray scale image (b) Adaptive threshold value us-ing Otsu’s method (c) Apply Morphological operation to remove noise andunwanted segments (d) Masking the result with the original image . . . . . . 16

4.4 Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.5 Gray-scale Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.6 Color-Based Segmentation Using the L*a*b* Color Space . . . . . . . . . . 194.7 K-mean Clustering in 5 segments. (a) Original image (b) to (f) Each segment 204.8 Watershed segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.9 Normalized Cut Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 224.10 GUI for tested segmentation methods . . . . . . . . . . . . . . . . . . . . . 234.11 Result comparison (a) Original image (b) Gray-scale image (c) Threshold-

ing 5 segments (d) Gray-scale Morphology (e) Watershed segmentation (f)Color-Based Segmentation Using the L*a*b* Color Space for 5 segments (g)K-mean Clustering in 5 segments (h) Normalized Cut in 5 segments . . . . . 25

4.12 K-mean clustering result for image sequences set 1 at a distance of 105, 55,and 5 yards (a) Original images (b) Result of 10 colors (c) Result of 20 colors 26

4.13 Normalized Cut segmentation result for image sequences set 1-3 at a distanceof 105, 55, and 5 yards (a) Original images (b) Result of 10 segments (c)Result of 20 segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.14 Some good results on Normalized Cut segmentation (a)(b) Body parts at 85yards (c)(d) Shirt pattern at 40 yards (e)(f) Face detail and shirt pattern at 15yards (g)(h) Face detail at 5 yards . . . . . . . . . . . . . . . . . . . . . . . 29

4.15 Face segmentations at a distance of 5 yards (a) Original image (b) K-meanclustering (c) Normalized Cut segmentation . . . . . . . . . . . . . . . . . . 30

4.16 (a) A sample surveillance video of two people walking toward a camera (b)Result video after frame differencing and morphological operations . . . . . 31

v

4.17 Selected 5 frames from the result video after frame differencing and morpho-logical operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.18 Average MATLAB processing time of Normalized Cut segmentation and K-mean clustering for 10 and 20 segments . . . . . . . . . . . . . . . . . . . . 34

4.19 (a) Hole in the body from frame differencing resulting from aperture problem(b) Result after applying 10 segments normalized cut (c) Result after applying20 segments normalized cut . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.1 Application: Tracking the hand movement for motion analysis by Machlineet.al. [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.1 (a) Results of data set 1 for K-mean clustering (b) Results of data set 1 forNormalized Cut segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 38

7.2 (a) Results of data set 2 for Normalized Cut segmentation (b) Results of dataset 3 for Normalized Cut segmentation . . . . . . . . . . . . . . . . . . . . . 39

vi

1 INTRODUCTIONObject segmentation is one important focus in computer vision. It is a way to interpret

the detail of an image. For example, in a video surveillance system, an image is segmented inorder to identify the object or human body. Segmentation is a process to partition the imageinto multiple regions which intended to extract the object from a background. Since digitalcameras have been widely used in a surveillance system, the needs to extract and interpret anobject out of a frame has gained more interest.

1.1 Motivations and ObjectivesMany surveillance systems are mainly used with human as an object. So it is impor-

tant and useful to segment the human body from an image. This project only focuses onsegmenting the human body into parts as can be seen in figure 1.1. In general, this projectaims at reviewing methods for segmentation to find a good method towards body part seg-mentation. Various segmentation techniques were tested to find a good body segmentationresult.

Figure 1.1: Segmenting human body into excessive segmented body parts

1.2 ApplicationsThis project focuses on segmenting a human body from an image into various parts.

The result can be further applied to many useful applications. One useful application is partsrecognition. The segmented parts are classified into human body organs such as legs, arm,torso, and head. Once the parts are recognized, they can be analyzed for gesture types. Forinstance, the position of body parts can be interpreted to sitting, standing, or lying. Chen Wuet.al. developed a project gesture analysis by segmenting the human body into parts as canbe seen in figure 1.2. Then fitted ellipses were used for further analysis on gesture. It is alsouseful in tracking especially for body parts tracking. In a surveillance system, it might bebetter to track only the human head for more detail. In this case, the camera can zoom forhead once it knows the position of the head part.

1

Figure 1.2: Application: Gesture analysis developed by Chen Wu [1] (image from [1])

1.3 ContributionsThe main contributions in this project include:

- Study and review various segmentation methods

- Discuss pros and cons of each method

- Among multiple methods, two methods are chosen for comparing. They are tested witha data set to find a better method which will then be used to analyze further data onimage and video

- Construct an experiment on body segmentation using MATLAB

- The goal is to find a good segmentation method for body part segmentation

2

2 IMAGE ACQUISITIONSThe project aims to test the segmentation methods with human images in different

distances ranging from 5 to about 100 yards. Because of this point, the soccer field, as shownin figure 2.1(a), at University of Tennessee is a suitable location. It has a parking lot whichis used as a place for shooting image sequences. The image sequence was taken along thered line as shown in figure 2.1(b). The longest distance is approximately 110 yards and the

(a) (b)

Figure 2.1: Location for image sequences (a) Soccer field at university of Tennessee marked in redcircle (b) The image sequence taken along the red line at the parking lot next to the soccer field (Imagesfrom Yahoo Maps http://maps.yahoo.com)

shortest distance is approximately 5 yards as shown in figure 2.2. The white lines in theparking lot are served as distance marking.

Figure 2.2: Images taken in various distances. The longest distance is approximately 110 yards andthe shortest distance is approximately 5 yards

The equipment used for taking the high quality images is Canon EOS 40D cameraand Canon EF 600mm f/4L lens as shown in figure 2.3. Canon EOS 40D image quality is 10

3

million pixels with image size of 3888 x 2592. The Canon EF 600mm f/4L (with no imagestabilization (IS)) lens has a focal length of 600mm which is commonly used for distancephotography. Figures 2.4 to 2.6 show three data sets. Each set contains 21 images in a

Figure 2.3: Equipment used is Canon EOS 40D camera and Canon EF 600mm f/4L lens

sequence of a distance from 105 yards to 5 yards. The first image in a sequence covers thefull human body and the last image mainly contains face. Distance between two consecutiveimages is about 5 yards.

4

Figure 2.4: 21 image sequences in distance from 105 yards down to 5 yards for data set 1

5


6


7

3 THEORIES AND METHODS FOR SEGMENTATIONImage segmentation is a common process in an image analysis especially in the field of

vision and tracking. Many algorithms and methods have been developed for image segmenta-tion. Common segmentation approaches are intensity-based, color-based, and shaped-basedsegmentation. Figure 3.1 shows some results from color-based and shaped-based segmen-tation. This project focuses on methods that based on intensity and color. Details of theselected methods used in the experiment are presented as the following.

(a) (b)

Figure 3.1: Segmentation example: (a) Color-based segmentation result by Lucchese et.al. [2] (b)Shape-based segmentation result by Tae-O-Sot et.al. [3]

3.1 ThresholdingThresholding is one of the simplest segmentation methods. It can extract the object

from the background by grouping the intensity values according to the threshold value. Forsingle thresholding, the result image is given by

g(x, y) =

{1, if f(x, y) > T

0, if f(x, y) ≤ Twhere T is threshold value

Similarly, multiple thresholding is given by

g(x, y) =

a, if f(x, y) > T2

b, if T1 < f(x, y) ≤ T2

c, if f(x, y) ≤ T1

where a, b, c are three distinct intensity values

Histogram Based ThresholdingHistogram based thresholding or adaptive thresholding is a more efficient method

which uses an adaptive threshold value. One attractive approach is Otsu’s method. Thismethod finds an optimal threshold value calculated from the histogram.

8

3.2 Edge DetectionEdge-based approach is a commonly used segmentation method that based on the

abrupt intensity changes of edges. A variety of edge detecting operators has been developed.The result image, however, requires additional processing to yield a segmentation from pro-cessing the pixel according to the edges [5]. Here is a list of edge detection operators that isused in the experiment: Prewitt Edge Detection , Roberts Edge Detection, Sobel Edge Detec-tion , Canny Edge Detection, Zerocrossing Edge Detection , Laplacian Edge Detection , LoG(Laplacian of Gaussian) Edge Detection .

3.3 Gray-Scale Morphological MethodThe basic morphological operations of dilation, erosion, opening, and closing can be

extended to gray-scale images. The concept is similar to the binary morphological operations.

Morphological GradientThe subtraction between dilation and erosion can be used to get the morphological

gradient of an image. The result is perform by

g = (f ⊕ b)− (f b) where

f ⊕ b is dilation of f by a flat structuring element b at any location (x, y). That is,[f ⊕ b](x, y) = max

(s,t)∈b{f(x− s, y − t)}

f b is erosion of f by a flat structuring element b at any location (x, y). That is,[f b](x, y) = min

(s,t)∈b{f(x+ s, y + t)}

3.4 Watershed SegmentationOne issue in segmentation is trying to segment a connected object. Watershed seg-

mentation is usually applied for this case. The watershed transformation finds ”catchmentbasins” and ”watershed ridge lines” in an image by treating it as a surface where light pixelsare high and dark pixels are low.

AlgorithmFollowing are the steps used in algorithm.

1. Use the Gradient Magnitude as the Segmentation Function

2. Compute foreground markers.

3. Compute background markers.

9

4. Modify the segmentation function so that it only has minimal at the foreground andbackground marker locations.

5. Compute the watershed transform.

This method uses markers so that the result is not oversegmented which happens whenusing the watershed transformation alone [5]. Saban [6] uses watershed method to segmentthe region in video sequences. The result shows that this method is promising as it is able tosegment the useful region in the video without using frame-to-frame segmentation.

3.5 Color-Based SegmentationColor is very meaningful information for vision. Human uses color to differentiate

and separate the object. Therefore, it is an interesting approach to do segmentation by usingthe color information from any chosen color space.

L*a*b* Color Space SegmentationHuman can visually distinguish different colors. The L*a*b* color space can be used

to quantify these visual differences. L*a*b space consists of:L : luminosity or brightnessa : chromaticity layer indicating where color falls along the red-green axisb : chromaticity layer indicating where color falls along the blue-yellow axis The ’a*’ and’b*’ layers contain color information. The differences between two colors can be calculatedfrom Euclidean distance.

AlgorithmFollowing are the steps used in algorithm.

1. Define sample regions average color in ’a*b* space for color of each segment

2. Use these color markers to classify each pixel

3. Classify each pixel using the nearest neighbor rule

• Find Euclidean distance between the pixel and each color marker

• The smallest distance classified the pixel to that color marker

3.6 K-mean ClusteringClustering is a way to separate groups of objects. K-mean clustering is a simple

unsupervised learning algorithm that finds partitions to group the objects into many clusters.In the experiment, color are transformed to L*a*b color space then the pixel are grouped usingK-mean clustering into regions of similar color. Other color space can also be used insteadof L*a*b color space. Sural [7] proposed an image segmentation approach by transformingthe image to HSV color space and grouping the pixel using K-mean clustering.

10

MATLAB K-mean Algorithm (function kmeans)There is a two-phase iterative algorithm to minimize the sum of point-to-centroid distances.

1. Batch updates: each iteration consists of reassigning points to their nearest clustercentroid, all at once, followed by recalculation of cluster centroids.

2. Online updates: points are individually reassigned; in doing so the sum of distances isreduced, and cluster centroids are recomputed after each reassignment. Each iterationduring this second phase consists of one passing through all the points. kmeans canconverge to a local optimum, in this case, a partition of points in which moving anysingle point to a different cluster increases the total sum of distances.

3.7 Normalized Cut SegmentationNormalized Cut segmentation proposed by Timothee Cour, Florence Benezit, Jianbo

Shi[8] is based on graph theory. Each pixel is treated as a node of graph and segmentationcan be presented as a graph partitioning problem. The optimal grouping solution is fromcomputation of eigenvalue problem.

Grouping Algorithm [8]G = (V,E) is a weighted undirected graph that represents a set of points in an arbitraryspace. The nodes are points in the space and an edge is formed between every pair of nodes.Let d(i) =

∑j w(i, j) and D be and N × N diagonal matrix with d on its diagonal and W

be and N ×N symmetric matrix with W (i, j) = w(i, j)

1. Given an image, set up a weighted graph G = (V,E) and set the weight on the edgeconnecting two nodes to be a measure of the similarity between the two nodes.

2. Solve (D −W )x = λDx for eigenvectors with the smallest eigenvalues.

3. Use the eigenvector with the second smallest eigenvalue to bipartition the graph.

4. Decide if the current partition should be subdivided and recursively repartition the seg-mented parts if necessary.

3.8 Motion SegmentationMotion segmentation is a fundamental approach for image sequences analysis. Many

algorithms have been proposed for motion segmentation. Nevertheless, the most basic ap-proach is temporal differencing.

Temporal differencing is a very simple method that obtains the differences betweentwo or three consecutive frames to detect a moving object. This method does not requirebackground information. After the difference is calculated, threshold is used to determinewhether it is an object or background.

11

Temporal differencing is very adaptive to scene changes; however, it gives a poor im-age result of extracting the entire object pixel that sometimes produces a foreground apertureproblem. For example, there may be a hole inside a moving object resulting from the differ-ences of a moving object between two consecutive frames. Nevertheless, this method can beimproved by using a connected component analysis to group the object segmentation into amotion region. Using more than two frames differencing will also improve the result [9].

12

4 EXPERIMENTAL RESULTS AND DISCUSSIONS

4.1 OverviewThe project aims to review many segmentation methods and analyze them to find a

suitable method for body part segmentation. The experiment applies various segmentationmethods to an image to yield segmented body especially in excessive parts as shown in figure1.1. Each body parts may contain several segments. Pros and cons of each method will alsobe compared and evaluated.The experiment results can be divided into three parts as:

1. segmentation on a single long range image

2. segmentation on sequence images

3. segmentation on a surveillance video

Seven methods, tested in part 1, are thresholding, edge detection, gray scale Morphology,L*a*b color space, K-mean clustering, Watershed, and Normalized Cut. Among these, twomethods giving good results are chosen to further apply with sequence images in Part 2.Comparing the two methods, a better method is chosen to apply with Part 3. In Part 3,the motion segmentation is first applied to the sample video to extract the object from abackground. Then, each frame uses the chosen method to get the segmented parts.

4.2 SoftwareThe experiment is developed using MATLAB version R2007a. MATLAB is a good

way to explore and test developing algorithms as basic image processing functions are alreadyprovided. As such, developing time is less comparing to C/C++. However, it is softwaredependency and computation time is slower than those implemented by C/C++. MATLABHelp is a good tutorial which can be found from [10].

4.3 Segmentation ResultsPart 1: A Single Long Range Image

The sample image is shown in figure 4.1(a) and the gray scale image is shown in figure4.1(b). In addition, its histogram is shown in figure 4.1(c). No pre-processing operation isapplied to the image in order to solely test the segmentation method. The image size isoriginally 3888 x 2592. However, when apply a segmentation method, MATLAB showserror message ”OUT OF MEMORY”. Thus, all images are resized to 800 x 533 to get rid ofsuch problem.

Figure 4.2(b) shows the result of single thresholding. The threshold value, chosenmanually according to the histogram, is 80. Morphological operations are applied to the

13

(a) (b) (c)

Figure 4.1: (a) Original image (b) Gray scale image (c) Histogram

threshold image to remove noise and unwanted segments. The image, as shown in figure4.2(c), use opening operation to get rid of noises. The unwanted white plane in figure 4.2(c)can be removed by MATLAB function imclearborder. The final result is then mask with theoriginal image as shown in figure 4.2(d).

Instead of manually choosing a threshold value, it can be adaptive regarding each im-age. Otsu’s method is an algorithm to calculate adaptive threshold value from its histogram.Figure 4.3 shows same operation as in the result in figure 4.2 but use adaptive threshold valuefrom Otsu’s method. The threshold results can extract the shirt pretty well but not the otherbody parts like head and arms. The pants that have dark color similar to the background arealso unable to be segmented from the background.

Another commonly used method in segmentation is edge detection. Several maskshave been tested with the sample image. As can be seen from the results in figure 4.4, noimage yields a good result in segmenting the body edge from background. Laplacian edgedetection in figure 4.4(g) gives the worst result among all. For most of these results, the back-ground area looks fuzzy. However, trying the remove those fuzzy lines in the backgroundwill also remove some of the human body edge lines that will surely affecting the final seg-mentation result. Due to the unsatisfactory result, no further process such as morphologicaloperation or noise removal is employed.

Another method to detect edges is Gray-scale Morphology. The morphological gra-dient (figure 4.5(d)) is computed from subtracting an erosion image (figure 4.5(c)) from adilation image (figure 4.5(b)). The result in figure 4.5(d) generates a stronger edge comparedto the result in previous edge detection. Threshold is then applied to this image and noise andunwanted segments are removed by morphological operation resulting in the final segmenta-tion in figure 4.5(e). This method can extract the shirt pretty well and also some body partslike one arm, feet, and partial head while the pants are still unable to be extracted.

Since the original image is color image, it is appealing to do color segmentation.L*a*b* color space is chosen instead of its original RGB color space. Figure 4.6 shows theresult of L*a*b color segmentation. Five colors which are tan, white, green, blue and gray aremanually chosen as color regions. The results show well segmented color region. As can beseen in figure 4.6(b), tan color which is skin is well extracted. The dark color pants are alsoable to be segmented as shown in figure 4.6(e). Moreover, the background is well segmented

14

(a) (b)

(c) (d)

Figure 4.2: Single thresholding with threshold value equals to 80 (a) Gray scale image (b)Thresholdimage (c) Apply Morphological operation to remove noise and unwanted segments (d) Masking theresult with the original image

15

(a) (b)

(c) (d)

Figure 4.3: Adaptive thresholding (a) Gray scale image (b) Adaptive threshold value using Otsu’smethod (c) Apply Morphological operation to remove noise and unwanted segments (d) Masking theresult with the original image

16

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 4.4: Edge detection (a) Gray scale image (b) Prewitt (c) Roberts (d) Sobel (e) Canny (f)Zerocrossing (g) Laplacian (h) Laplacian of Gaussian (LoG)

17

(a)

(b) (c)

(d) (e)

Figure 4.5: Gray-scale Morphology (a) Gray scale image (b) Dilation of an original image (c) Erosionof an original image (d) Subtraction of (b) and (c). (e) Output image after morphological operation

18

into two parts; trees and concrete, shown in figure 4.6(d) and figure 4.6(f).

(a) (b)

(c) (d)

(e) (f)

Figure 4.6: Color-Based Segmentation Using the L*a*b* Color Space in 5 colors: (a) Original image(b) tan (c) white (d) green (e) blue (f) gray

The next method called K-mean clustering also uses L*a*b color space to classifycolor in each pixel but there is no need to define the region color values as done in the previousmethod, L*a*b* color segmentation. Instead, only the number of clusters is to be specified.

19

In the experiment, five clusters are specified to segment the image into five regions as shownin figure 4.7. The outputs show similar result to the L*a*b* color space segmentation butthe colors are not divided as nicely as L*a*b*. For example, the trees are divided into twosegments and body skin is grouped with the gray pavement as seen in figure 4.7.

(a) (b)

(c) (d)

(e) (f)

Figure 4.7: K-mean Clustering in 5 segments. (a) Original image (b) to (f) Each segment

The following method is Watershed segmentation. Watershed ridge lines of the image

20

are shown in figure 4.8(b). Foreground marker of the object is the white region in figure4.8(c). This figure shows ridge lines, foreground marker and segmentation boundary on topof the original image. Final Watershed segmentation result is shown in figure 4.8(d), noticethe nice region cut in the final result.

(a) (b)

(c) (d)

Figure 4.8: Watershed segmentation (a) Gray scale image (b) ridge lines (c) ridge lines, foregroundmarker, and boundary on the original image (d) Watershed segmentation result

The last method tested in the first part is Normalized Cut segmentation. The codefor this method was adopted from MATLAB code developed by Timothee Cour, FlorenceBenezit, Jianbo Shi under GNU General Public License [11] which can be used for researchpurposes. Figure 4.9 shows the result of Normalized Cut segmentation with number of clus-ters equal to 4, 5 and 8. The result seems to be able to handle complex background well.For four clusters, the whole body is segmented except the head. Figure 4.9(c) that has fiveclusters yield similar result to 4-clusters image. For the case of eight clusters, parts are betterseparated such as head, shirt, and pants. However, pants are blended with a small part of thebackground.

As for experimentation, all of the tested methods except Normalized Cut segmenta-tion are demonstrated as MATLAB graphic user interface (GUI) shown in figure 4.10. The

21

(a) (b)

(c) (d)

Figure 4.9: (a) Original image (b) Normalized Cut Segmentation with number of cluster: n=4 (c) n=5(d) n=8

22

reason that Normalized Cut segmentation is not included is because of its long processingtime. Normalized Cut segmentation takes much longer time to process compared to othermethods. This issue will be discussed in a later section. This GUI has two sample imagesand six methods. Some parameters in each method can be specified such as threshold valueor numbers of cluster for K-mean clustering.

Figure 4.10: GUI for tested segmentation methods

4.4 Comparison of Segmentation MethodsIn this section, the results are compared and discussed. Figure 4.11 shows the results

from six methods except edge detection as mentioned earlier that no further operation wasapplied to the edge detection results. Among these methods, two methods are chosen forfurther experiment with image sequences in the next part.

23

Gray-scale Morphology yields only one segment as shown in figure 4.11(d). For Wa-tershed segmentation as shown in 4.11(e), number of segments can not be specified. Thus,these two are not suitable for multi-parts segmentation. On the other hand, the number ofsegments can be defined in other methods. Thresholding can segment five regions using fourthreshold values as shown in figure 4.11(c). Five color values have to be defined for L*a*b*color space segmentation. As for K-mean clustering and Normalized Cut, only the numberof segments needs to be defined which is more flexible than other methods. They require noprior knowledge like thresholding and L*a*b* segmentation. To get a good segmentation forthe particular part, threshold value for that region needs to be known. Similarly, particularcolor values need to be known for L*a*b* segmentation.

The original image has a complex background. Nevertheless, Gray-scale Morphol-ogy, Watershed segmentation, and Normalized Cut can handle it well, which results in cleanbackground segments. Moreover, these methods can segment the object into nicely-cut re-gion. For example, the shirt segment in these methods is a one simple shape. In comparison,the shirt segment in other methods are either a complex shape or composing of complicatedsegments. In addition, L*a*b* color based segmentation, K-mean clustering, and Normal-ized Cut are able to segment the object that has similar color to the background. For instance,these three methods can segment the pants that have dark color similar to the background. Onthe other hand, Gray-scale Morphology and Watershed segmentation can only segment themajor part like shirt that has a dramatic different in color from the background. Watershedsegmentation is a technique to limit oversegmentation [5]. This method uses markers so thatthe result is not oversegmented.

The segmentation method evaluations are summarized in table 4.1. Comparing allthese methods, K-mean clustering and Normalized Cut segmentation are chosen as two meth-ods to be further tested with image sequence sets in Part 2. The reason is that the number ofsegment can be specified without manually choosing some values like thresholding or L*a*bcolor based. In this experiment, a method that can yield multi-segment region in the objectis needed. Moreover, both can extract the foreground object that has similar color to thebackground. Even though the Normalized Cut is very slow, detail about processing time isdiscussed in issue and limitation section, it is robust to the complex background and has verynice segment regions.

Part 2: Image SequencesFrom the previous part, K-mean clustering and Normalized Cut segmentation are cho-

sen for further analysis with sequence images. Then, by comparing the two methods, a bettermethod is chosen for a surveillance video in Part 3.

The results from K-mean clustering of image sequence set 1 at a distance of 105, 55,and 5 yards are shown in figure 4.12. The results from the full sequences are shown in figure7.1(a) in the Appendix. As can be seen, the background segments for both 10 and 20 colorsare very complicated. Similarly, the body and face segments contain small complex pieces.As the result yields pretty complicated segments, no result from image sequence set 2 and 3is displayed here.

Normalized Cut segmentation result for image sequences set 1-3 at a distance of 105,

24

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 4.11: Result comparison (a) Original image (b) Gray-scale image (c) Thresholding 5 seg-ments (d) Gray-scale Morphology (e) Watershed segmentation (f) Color-Based Segmentation Usingthe L*a*b* Color Space for 5 segments (g) K-mean Clustering in 5 segments (h) Normalized Cut in 5segments

25

(a) (b) (c)

Figure 4.12: K-mean clustering result for image sequences set 1 at a distance of 105, 55, and 5 yards(a) Original images (b) Result of 10 colors (c) Result of 20 colors

26

(a) (b) (c)

Figure 4.13: Normalized Cut segmentation result for image sequences set 1-3 at a distance of 105,55, and 5 yards (a) Original images (b) Result of 10 segments (c) Result of 20 segments

27

Methods No.

ofSe

gmen

ts

Man

ually

Cho

sen

Val

ue

Segm

entS

imila

rCol

or

Rob

ustt

oC

ompl

exB

g.

Nic

eSe

gmen

tCut

Proc

essi

ngTi

me

Thresholding multiple yes poor poor poor fast

Edge Detection 1 no poor poor poor fast

Gray-scale Morphology 1 no poor good good fast

Watershed cannot defined no poor very good good slow

L*a*b* Color-Based multiple yes good poor poor ok

K-mean Clustering multiple no good poor poor slow

Normalized Cut multiple no good very good good very slow

Table 4.1: Comparison of Segmentation Methods

55, and 5 yards are displayed in figure 4.13. The results from the full sequences are shown infigure 7.1(b), 7.2(a), 7.2(b) in the Appendix. At 105 yards, some body parts can be extractedbut they are not very good in quality. In contrast, 55 yards give more details of body parts.Comparing the three results at 55 yards, the lighting and face color do affect the result. Ifthe lighting on face is too low or the face skin is too dark, it is likely not to be able to detectthe head as well as the brighter one. Comparing 10 and 20 segments at 5 yards, face detailsare better extracted with 20 segments. The 10 segments are too low to get the face details.However, if considering only major body parts, 10 segments is good enough as it yieldsquite similar results with 20 segments. From the full image sequence, it can be seen that thedistance to get some face details should be less than 20 yards for 20 segments NormalizedCut.

Figure 4.14 shows some good results of Normalized Cut segmentation. It is able tosegment the body parts, shirt pattern, and face details. The examples are body parts at 85yards, shirt pattern at 40 yards, face detail and shirt pattern at 15 yards, and face detail at 5yards.

Figure 4.15(b) shows face segmentations of K-mean clustering at a 5-yard distance.Hair and face are separately segmented. In addition, eyes, nose, and mouth are able to besegmented as well. Even though, this method can segment the body parts and face detail, thesegmentation result is too complicated for further operation or application. In contrast, facesegmentations of Normalized Cut segmentation at the same distance have nice segmentationregions. Although the major parts like hair and face are nicely segmented into small parts,

28

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 4.14: Some good results on Normalized Cut segmentation (a)(b) Body parts at 85 yards (c)(d)Shirt pattern at 40 yards (e)(f) Face detail and shirt pattern at 15 yards (g)(h) Face detail at 5 yards

29

some face details, notably one eye and mouth, are missing. Given these points, the Normal-ized Cut segmentation still seems to be better than K-mean clustering as most major partsare able to be segmented into nice pieces. Thus, Normalized Cut segmentation is used as amethod in the next part for a surveillance video.

(a)

(b) (c)

Figure 4.15: Face segmentations at a distance of 5 yards (a) Original image (b) K-mean clustering (c)Normalized Cut segmentation

Part 3: A Surveillance VideoIn this part, Normalized Cut segmentation is used as a segmentation method in the

sample surveillance video. The sample video is from IRIS lab. It is a 6-second video of twopeople walking toward the camera as shown in figure 4.16(a). The video is 30 frames persecond and the frame size is 720 x 480. Firstly, the motion segmentation is applied to thevideo to extract the object from a background as shown in figure 4.16(b). The detail steps areas follow:

• Frame differencing

• Thresholding

30

• Morphological operations to remove noises and fill holes in the body

– Opening: remove noise (function imopen)

– Closing: join unconnected parts and fill holes (function imclose)

– Fill image regions and holes (function imfill)

Then, each frame uses the Normalized Cut segmentation method to get the segmented parts.

(a) (b)

Figure 4.16: (a) A sample surveillance video of two people walking toward a camera (b) Result videoafter frame differencing and morphological operations

After frame differencing and morphological operations, the result frames are shown infigure 4.17(a). Then, 20-segment Normalized Cut is applied to the frames in figure 4.17(a).The results are shown in figure 4.17(b). However, the background, segmented into differentregions, is unwanted. So, they are removed by masking these frames with the black back-ground frames resulting in the frames in figure 4.17(c). Look at the body part color in eachconsecutive frame in figure 4.17(c), the color is inconsistent due to the colormap assignmentof MATLAB. For example, color of right person’s head is orange in the first frame but it isblue in the next frame

4.5 Issues and LimitationsProcessing time is one issue that needs to be mentioned. Most of the methods tested

run quickly in MATLAB except for K-mean clustering and Normalized Cut. MATLAB pro-cessing time for these two methods are displayed in table 4.2.

Time used for each method in 21 image sequences is presented. 20 segments seg-mentation used about twice the amount of time used for 10 segments. More importantly,Normalized Cut is about 7 or 8 times slower than K-mean clustering. The processing timedepends linearly on the image size and number of segments and major computation time isin the eigenvector computation. [12]

31

(a) (b) (c)

Figure 4.17: (a) 5 frames from the result video after frame differencing and morphological operations.(b) Results from (a) after applying 20 segments normalized cut. (c) Results from (b) after removingthe background by masking with the frame in (a)

32

Image Time (seconds)

10 segments Ncut 10 colors Kmean 20 segments Ncut 20 colors Kmean

1 29.9 16.0 71.3 8.0

2 29.2 4.4 67.4 10.0

3 32.3 3.7 71.1 9.7

4 32.4 5.1 75.1 6.1

5 29.7 5.0 65.7 9.8

6 29.0 4.3 62.5 10.9

7 32.8 4.1 81.6 10.6

8 27.4 2.8 75.6 8.6

9 28.8 3.9 70.4 12.3

10 29.6 3.2 69.8 5.8

11 32.5 5.1 63.4 6.3

12 34.1 2.5 77.2 7.1

13 33.5 4.6 74.8 6.4

14 27.5 4.8 65.6 8.6

15 35.8 4.0 81.7 7.0

16 37.7 4.0 76.5 5.9

17 36.3 4.2 61.7 4.2

18 33.8 5.3 74.0 8.4

19 33.0 4.4 74.7 7.7

20 37.0 3.3 71.2 11.6

21 40.1 3.5 67.9 6.9

Average 30.6 4.5 68.2 7.9

Table 4.2: MATLAB processing time of Normalized Cut segmentation and K-mean clustering for 10and 20 segments

33

Figure 4.18: Average MATLAB processing time of Normalized Cut segmentation and K-mean clus-tering for 10 and 20 segments

A significant issue for frame differencing is aperture problem. By differencing theconsecutive frames, the object sometimes has holes inside as shown in figure 4.19(a). In thatcase, it affects the results as shown in figure 4.19(b) and 4.19(c). Some small holes can befilled using morphological operations. However, filling the holes can also increase noises.

For Normalized Cut segmentation, the segmented parts are not consistent. For exam-ple, the head is able to be segmented in one particular frame but the head in the followingframe is too dark so it is unsegmented causing no head segment in the result. Later, the headis able to be segmented again when the image is good enough. In this case, the head segmentis missing in some frames resulting in an inconsistency of the segmented parts. Moreover, thecolor of the segmented part is also inconsistent due to the colormap assignment of MATLAB.

34

(a)

(b) (c)

Figure 4.19: (a) Hole in the body from frame differencing resulting from aperture problem (b) Resultafter applying 10 segments normalized cut (c) Result after applying 20 segments normalized cut

35

5 CONCLUSIONSThe segmentation methods have been discussed. As my conclusion, Normalized Cut

segmentation is the most promising, among those methods, for multiple body parts segmenta-tion. It can segment a body into excessive parts in which cut regions are not too complicatedfor further application. This method is robust to background complexity and also able tosegment the body parts that have color similar to background. For video segmentation, mo-tion segmentation like frame differencing should be applied before applying Normalized Cutsegmentation to remove the background segments.

The number of segments does affect the segmented result. To get face detail, the num-ber of segments should be high enough. In this case, at least 20 segments are recommended.However, if only concerning the major body parts like head, arms, or legs, 10 segments seemto be enough even in a long distance. The number of segments should be appropriated andmatch the aim because the higher the number of segments results in the higher the processingtime.

The distance does affect the segmentation result. From the experiment, distance toget good face detail should be less than 20 yards. However, it also depends on the size andquality of the image. If the image has higher resolution than the images in the experiment,it is likely that the distance will be longer. Furthermore, this method appears to be able tosegment the body pretty well even in a long distance. From the experiment on 10 segments,it can segment major body parts within 100 yards.

This method developed in MATLAB is now not applicable to real-time processing asprocessing time is quite slow. In the experiment, an image with the size of 800 x 533 takesabout 30 seconds for 10 segments and 68 seconds for 20 segments. As a matter of fact, itwill also be difficult for off-line video processing. For example, a 10-second video with 30frames per second will take approximately 2.5 hours to process 10 segments Normalized Cut.Considering this reason, developing this algorithm in C/C++ is advised as the processing timeshould be decreased.

36

6 FUTURE WORKSThis project focuses on segmenting the human body into excessive parts. These output

data can be further applied to many applications. One application is to group and classify theexcessive parts into legs, arms, body, and head. Face can also be labeled as face componentssuch as eyes, eyebrows, nose, and mouth. Once these components are recognized, they canbe used to identify the motion or gesture by the positioning of each component. For example,hands movement as in figure 6.1 can be tracked for gesture and motion analysis [4]. Alter-natively, it can be further applied to human detection and tracking in a surveillance system.Indeed, specific parts such as face can be tracked and the camera can zoom to get more detailon a face.

Figure 6.1: Application: Tracking the hand movement for motion analysis by Machline et.al. [4]

For Normalized Cut segmentation, the number of cluster needs to be specified. There-fore, more research can be further studied to implement an algorithm that automatically cal-culates the optimal the number of segments that is able to show good details of segmentedparts but it should be be over-segmented.

Another application topic is segmentation in real-time processing. As mentioned ear-lier in the result, Normalized Cut segmentation requires long processing time. The method islikely to be faster if implemented in C++ instead of MATLAB. Another suggestion is to useOpenCV (Open Source Computer Vision) which is a library for real-time computer vision.

37

7 APPENDIXThe appendix contains results of full image sequences for K-mean clustering and Nor-

malized Cut segmentation

(a) (b)

Figure 7.1: (a) Results of data set 1 for K-mean clustering (b) Results of data set 1 for NormalizedCut segmentation

38

(a) (b)

Figure 7.2: (a) Results of data set 2 for Normalized Cut segmentation (b) Results of data set 3 forNormalized Cut segmentation

39

References[1] C. Wu and H. K. Aghajan, “Model-based image segmentation for multi-view human

gesture analysis,” in ACIVS, 2007, pp. 310–321.

[2] L. Lucchese and S. Mitra, “Unsupervised segmentation of color images based on k-means clustering in the chromaticity plane,” Proceedings of the IEEE Workshop onContent-Based Access of Image and Video Libraries, 1999.

[3] S. Tae-O-Sot, S. Auethavekiat, and S. Jitapunkul, “Shape based segmentation by levelset method for medical objects containing two regions,” IEEE International Conferenceon Image Processing, 2006.

[4] M. Machline, L. Zelnik-Manor, and M. Irani, “Multi-body segmentation: Revisitingmotion consistency,” International Journal of Computer Vision, vol. 68, 2006.

[5] M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis, and Machine Vision,2nd ed. PWS Publishing, 1998.

[6] M. A. E. Saban and B. S. Manjunath, “Video region segmentation by spatio-temporalwatersheds,” Proceedings. 2003 International Conference on Image Processing. ICIP2003., vol. 1, 2003.

[7] S. Sural, G. Qian, and S. Pramanik, “Segmentation and histogram generation using thehsv color space for image retrieval,” Proceedings. 2002 International Conference onImage Processing, vol. 2, 2002.

[8] T. Cour, F. Benezit, and J. Shi, “Normalized cut and image segmentation,” IEEE Trans-actions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, August 2000.

[9] W. Hu, T. Tan, L. Wang, and S. Maybank, “A survey on visual surveillance of objectmotion and behaviours,” IEEE Transactions on Systems, Man and Cybernetics - Part C:Applications and Reviews, vol. 34, no. 3, 2004.

[10] “Matlab help,” http://www.mathworks.com/access/helpdesk/help/techdoc/matlab.shtml.

[11] T. Cour, F. Benezit, and J. Shi, “Multiscale normalized cuts segmentation code,” http://www.seas.upenn.edu/∼timothee/software/ncut multiscale/ncut multiscale.html.

[12] ——, “Spectral segmentation with multiscale graph decomposition,” IEEE Interna-tional Conference on Computer Vision and Pattern Recognition (CVPR), 2005.

[13] R. Gonzalez and R. Woods, Digital image processing, 2nd ed. Prentice-Hall, 2002.

[14] S. Phung, A. Bouzerdoum, and D. Chai, “Skin segmentation using color pixel classifi-cation: Analysis and comparison,” IEEE Transaction on Pattern Analysis and MachineIntelligence, vol. 27, 2005.

40

Documents

SEGMENTATION METHODS FOR MULTIPLE BODY PARTS