10
An Adept Segmentation Algorithm and Its Application to the Extraction of Local Regions Containing Fiducial Points Erhan AliRiza ˙ Ince and Syed Amjad Ali Eastern Mediterranean University, Electrical and Electronic Engineering Famagusta, North Cyprus [email protected], [email protected] Abstract. Locating human fiducial points like eyes and mouth in a frontal head and shoulder image is an active research area for applica- tions such as model based teleconferencing systems, model based low bit rate video transmission, computer based identification and recognition systems. This paper proposes an adept and efficient rule based skin color region extraction algorithm using normalized r-g color space. The given scheme extracts the skin pixels employing a simple quadratic polyno- mial model and some additional color based rules to extract possible eye and lip regions. The algorithm refines the search for fiducial points by eliminating falsely extracted feature components using spatial and geo- metrical representations of facial components. The algorithm described herein has been implemented and tested with 311 images from FERET database with varying light conditions, skin colors, orientation and tilts. Experimental results indicate that the proposed algorithm is quite robust and leads to good facial feature extraction. 1 Introduction Extracting face regions containing fiducial points and recovering pose which are two challenging problems in computer vision have been widely explored by re- searchers. Many vision applications such as video telephony, face recognition, hybrid access control, feature tracking, model based low bit rate video transmis- sion and MPEG-4 coding require feature extraction and pose recovery. There exist various methods for the detection of facial features and a detailed liter- ature survey about these techniques is available in [1-6]. One of the very first operation that is needed for facial feature detection is the face localization. To achieve face localization many approaches such as segmentation based on skin color [2,3], clustering [7], Principal Component Analysis [8], and neural nets [9] have been proposed. Once the face region is located it can be made more evident by applying a region growing technique [10]. Facial features can then be extracted from the segmented face region by making use of image intensity, chromaticity values and geometrical shape properties of the face. It has been clearly demonstrated in [11] that local facial components based recognition will outperform the global face based approaches since the global A. Levi et al. (Eds.): ISCIS 2006, LNCS 4263, pp. 553–562, 2006. c Springer-Verlag Berlin Heidelberg 2006

An Adept Segmentation Algorithm and Its Application to …faraday.ee.emu.edu.tr/eaince/Publications/ISCIS06paper.pdf · An Adept Segmentation Algorithm and Its Application to the

Embed Size (px)

Citation preview

An Adept Segmentation Algorithm and ItsApplication to the Extraction of Local Regions

Containing Fiducial Points

Erhan AliRiza Ince and Syed Amjad Ali

Eastern Mediterranean University, Electrical and Electronic EngineeringFamagusta, North Cyprus

[email protected], [email protected]

Abstract. Locating human fiducial points like eyes and mouth in afrontal head and shoulder image is an active research area for applica-tions such as model based teleconferencing systems, model based low bitrate video transmission, computer based identification and recognitionsystems. This paper proposes an adept and efficient rule based skin colorregion extraction algorithm using normalized r-g color space. The givenscheme extracts the skin pixels employing a simple quadratic polyno-mial model and some additional color based rules to extract possible eyeand lip regions. The algorithm refines the search for fiducial points byeliminating falsely extracted feature components using spatial and geo-metrical representations of facial components. The algorithm describedherein has been implemented and tested with 311 images from FERETdatabase with varying light conditions, skin colors, orientation and tilts.Experimental results indicate that the proposed algorithm is quite robustand leads to good facial feature extraction.

1 Introduction

Extracting face regions containing fiducial points and recovering pose which aretwo challenging problems in computer vision have been widely explored by re-searchers. Many vision applications such as video telephony, face recognition,hybrid access control, feature tracking, model based low bit rate video transmis-sion and MPEG-4 coding require feature extraction and pose recovery. Thereexist various methods for the detection of facial features and a detailed liter-ature survey about these techniques is available in [1-6]. One of the very firstoperation that is needed for facial feature detection is the face localization. Toachieve face localization many approaches such as segmentation based on skincolor [2,3], clustering [7], Principal Component Analysis [8], and neural nets [9]have been proposed. Once the face region is located it can be made more evidentby applying a region growing technique [10]. Facial features can then be extractedfrom the segmented face region by making use of image intensity, chromaticityvalues and geometrical shape properties of the face.

It has been clearly demonstrated in [11] that local facial components basedrecognition will outperform the global face based approaches since the global

A. Levi et al. (Eds.): ISCIS 2006, LNCS 4263, pp. 553–562, 2006.c© Springer-Verlag Berlin Heidelberg 2006

554 E.A. Ince and S.A. Ali

Input Image

Skin Segmentation

Eye Detection Lip Detection

Output Image

Fig. 1. (a) Feature detection archetype (b) Skin color cluster in (r − g) space

approaches are more sensitive to image variations caused by translations and fa-cial rotations. Hence this paper suggests an efficient segmentation algorithm forlocating and cropping local regions containing fiducial points. In order to reducethe search area in the input images skin-pixels are extracted using a quadraticpolynomial model. Also to alleviate the influence of light brightness in extractingskin pixels, the proposed algorithm adopts the r-g chromatic color coordinates forcolor representation. Over the r-g plane the skin pixels will form a compact regionand the algorithm’s computational cost is lower in comparison to probabilistic andneural network based models. Since the objective is to design a real-time systemit is essential that the complexity of the algorithm chosen is not high.

Unlike many published works in the literature this paper adopts an approachin which the eyes and the mouth features are extracted independently of eachother. As depicted in Fig. 1(a) this will provide the extra advantage of beingable to run the two extraction routines in parallel for better time management.

The paper is organized as follows: section 2 provides details about the standardcolor FERET database and explains how the collection of 311 images belongingto 30 subjects was chosen. Section 3 introduces the rule-based skin segmentationand sections 4 and 6 detail how to generate the feature and lip maps respectively.Sections 5 and 7 are about rules for facial component verifications and simulationresults. Details about the efficiency in extracting each individual feature are givenin section 8. Finally conclusions are made in section 9.

2 The Standard Color FERET Database

The database used in the simulations is a subset of the color FERET data-base [12,13] which has been specifically created in order to develop, test andevaluate face recognition algorithms. We have selected 30 subjects randomlyfrom the pool and accumulated 311 pictures in total using only the FA, FB,QL, QR, RB and RC poses for each subject. FA and FB are frontal images,QR and QL are poses with the head turned about 22.5 degrees left and right

An Adept Segmentation Algorithm and Its Application 555

respectively, and RB and RC are random images with the head turned about15 degrees in either direction. The standard color FERET database containsimages with multiple faces under various illumination conditions. The imagescome in two different sizes. The large ones are (512×768) and the small ones are(256 × 384). Profile left/right (PL,PR) and half left/right (HL,HR) poses havebeen intentionally left out because no authorized user of a hybrid access systemmaking use of facial feature based identification is expected to pose in front ofthe camera at an angle more than 22.5 degrees.

3 Skin Segmentation

For humans, skin-tone information is a useful mean for segmenting skin region.The RGB, normalized r-g, Y CbCr and HSV color spaces or their variations areused frequently in literature [1,14] for skin segmentation. In this work, we haveused both the RGB and normalized r-g color spaces. The normalized red-greencomponents are computed by using the following relations

r =R

R + G + B(1)

g =G

R + G + B(2)

Once the r-g components are obtained a simple quadratic polynomial model[15] is used to determine the upper and lower thresholds for the skin region asshown in Fig. 1(b).

fupper(r) = −1.3067r2 + 1.0743r + 0.1452 (3)

flower(r) = −0.7760r2 + 0.5601r + 0.1766 (4)

Finally, skin segmentation is done by applying together the following three rules

S1. flower(r) < g < fupper(r)S2. R > G > BS3. R − B > 10

to obtain a raw binary mask (BM) as follows:

BM ={

1 if all segmentation rules S1, S2 and S3 are true,0 otherwise.

The binary mask is refined by first selecting the largest connected binaryregion in the image and then filling the holes inside the region. Lastly, closing thegaps (holes) connected to the background in the upper part of the binary image(mostly eye and eyebrow in a left or right rotated head create such regions). Theoutcome of each phase of the skin segmentation algorithm in shown in Fig. 2.

In order to close the holes connected to the background in the upper part ofthe image we first define toprow, bottomrow, leftcolumn and rightcolumn as

556 E.A. Ince and S.A. Ali

Fig. 2. Left to right: (a) Original image (b) Binary mask (BM) (c) Largest connectedbinary mask with holes filled (d) Binary mask after closing the gaps

Fig. 3. Binary face mask with marked boundaries

shown in Fig. 3. For each column of the binary map we apply the processingfrom toprow down to 45% elements of the height (hindex). The mechanism forclosing the gaps can be explained using a simple example.

Suppose x =(1 1 1 1 0 0 1 1 1 0 0 0 1 1 0

)contains the binary pixels of any

selected column. Finding the starting and ending index of contiguous 1’s we get

y =(

1 7 134 9 14

). Now filling indices (5 6) and (10 11 12) with 1’s, the modified col-

umn values become x =(1 1 1 1 1 1 1 1 1 1 1 1 1 1 0

). The number of columns

from both left and right side of the image are chosen to be 30% of the width(windex).

Skin segmentation for binary face map has the advantage that the processingrequired to extract fiducial points needs to be carried out only inside this map.Secondly, it is possible to locate eyes and lip region independent of each otherso that their performance will not depend on each other. Independent search tolocate eyes and the lip region also has the benefit that they can be carried outin parallel to speed up the processing.

4 Feature Map Generation

Most approaches for eye and mouth detection are template based. However, inthis paper we directly locate eyes and mouth based on measurements derived

An Adept Segmentation Algorithm and Its Application 557

from the r-g, RGB and Y CbCr color space components of the images. In orderto locate the possible eye region we first create a feature map (FM) using theequations and conditions below.

fupper(r) = −1.3067r2 + 1.0743r + 0.1452 (5)

flower(r) = −0.7760r2 + 0.5601r + 0.1766 (6)

S1. flower(r) < g < fupper(r)S2. R > G > B

S3. R − G > 10S4. 65 < Cb < 125 and 135 < Cr < 165

FM ={

1 if all segmentation rules S1, S2, S3 and S4 are true,0 otherwise.

Once FM is obtained it is complemented and the components touching theborders are cleared to obtain the composite feature map (CFM). Afterwardsthe CFM is masked by the binary face mask (BM) obtained in previous sectionand finally the eye region is extracted using:

eyereg = CFM(toprow + 0.19 ∗ hindex : toprow + (0.52 ∗ hindex)) (7)

where hindex represents the height of the binary skin segmented image (see Fig.3). The steps, as described above for the extraction of the region containing theeyes can be seen in Fig. 4.

Fig. 4. Obtaining the region containing the eyes

558 E.A. Ince and S.A. Ali

5 Rules for Eye Component Verifications

Among all the extracted possible eye candidates only the largest 5 componentsare kept. If fewer than 5 components remain then all components are kept. Fromthe selected possible eye components there exist some false candidates that needsto be eliminated. In order to remove these falsely extracted components, theproposed algorithm performs component verifications based on a set of ruleswhich are discussed below.

A. If the height for any candidate is larger than a threshold value then thosecomponents should be eliminated (refer to Fig. 5).

B. If the right vertical border of any component is to the left of a vertical lineat a distance equal to one-eight of the image width or the left vertical borderis to the right of a vertical line at seven-eight of the image width then theyshould be eliminated (refer to Fig. 6a).

C. Knowing that in an image the eyes are always horizontally aligned (sym-metric), we first detect the top, bottom, left and right boundaries for eachpossible eye component and scan the left and right side within the verticalboundaries to see if other components exist. If we find any component/s inthe searched bands the component under test is kept but if no other compo-nent is found the candidate should be eliminated (refer to Fig. 6b).

Fig. 5. Eliminating components with large height

Fig. 6. (a) Out of bounds elimination (b) Isolated component elimination

Fig. 7. Possible four component cases

An Adept Segmentation Algorithm and Its Application 559

D. In the case of having 4 remaining components which also satisfy the symmetryproperty the decision is based on two criterions. At first we find the horizontaldistance between the two symmetric components namely d1 and d2 and sec-ondly the vertical distance h between them. If the horizontal distances d1 andd2 are comparable but the vertical distance between them is small we choosethe lower two components as eyes, since the upper components are eyebrows.If the horizontal distances d1 is fairly greater than d2 and h is also relativelylarge we choose the upper two components as eyes, as the lower componentsare due to nostrils. If the horizontal distances d1 and d2 are comparable but his quite large we again choose the upper two components as eyes, as the lowercomponents are due to the lip corners (refer to Fig. 7).

E. If there exist more than two components that are aligned in a horizontalband, find the center (half of windex) and choose the two components thatare at a minimum horizontal distance from the center.

6 Lip Map Generation

Lip detection as discussed in [15] states that the lip color distribute at the lowerareas of the crescent area defined by the skin colors on the r-g plane. Hence, asbefore we can define another quadratic polynomial discriminant, lr(r), for theextraction of the lip pixels. How to combine the three polynomial discriminantsand the RGB color space information in order to obtain a lip map (LM) isshown below.

fupper(r) = −1.3067r2 + 1.0743r + 0.1452 (8)

flower(r) = −0.7760r2 + 0.5601r + 0.1966 (9)

lr(r) = −0.7760r2 + 0.5601r + 0.2563 (10)

LM ={

1 if g > flower, g < lr, R > 60, G > 30, B > 300 otherwise.

Fig. 8. Extracting the mouth regions

560 E.A. Ince and S.A. Ali

where R,G and B are the intensity values in the red, green and blue channels ofthe RGB color space. Final step in the processing is to crop the lip map usingequation (11) to get the mouth region.

MouthRegion = LM(bottomrow − ceil(0.60 ∗ hindex) : bottomrow − 25)(11)

The above described processing steps are depicted in Fig. 8.

7 Rules for Mouth Component Verifications

To remove falsely extracted mouth candidates the following set of rules are ap-plied.

A. If the right vertical border of any component is to the left of a vertical lineat a distance equal to 16 % of the image width or the left vertical border isto the right of a vertical line at 80 % of the image width then they shouldbe eliminated.

B. Remove all components whose width to height ratio is below 1.8.C. If the area (number of connected 1’s) for a candidate is less than a fixed

threshold value then remove this component.D. If the area (number of connected 1’s) for a candidate is greater than a fixed

threshold value then also remove this component.E. If more than 2 components remain select the largest two.F. Compute the ratio A · (w2

h ) for each of the two components where, A is thearea, w is the width and h is the height of the component considered. Finally,select the one with the larger ratio.

8 Simulation Results

We tested the proposed algorithm on 311 test images (30 subjects) that we haverandomly selected from the color FERET database. Four sample faces with

Fig. 9. Marked eye features

An Adept Segmentation Algorithm and Its Application 561

Fig. 10. Marked mouth features

detected features using the proposed algorithm are shown in Fig. 9 and Fig. 10.The experimental results show that the algorithm can robustly detect the localregions for people with varying skin-tones.

Table 1 provides a summary for the correct detection rates for the left eye,right eye and the mouth regions. As can be seen from the results all three ratesare very promising.

Also since this method does not extract the mouth corners based on the lip-cut, it is not required that the subject’s mouth is closed.

Table 1. Performance in extracting component regions from FERET database

region containing left eye 92.53region containing right eye 94.02

region containing mouth 91.00

9 Conclusion

A efficient rule based local region extraction algorithm making use of quadraticpolynomial discriminants derived from the r-g chromatic coordinates and theRGB color space information is proposed. The algorithm will eliminate the falsefeature candidates using spatial and geometrical representations of facial com-ponents. The paper adopts an approach in which the feature and lip maps aregenerated independently and hence provides the flexibility of being able to runthe two extraction routines in parallel for better time management. Preliminarysimulation results indicated in Table 1 implies that the proposed algorithm isquite effective. Authors believe that if the affine transform parameters are es-timated and the transformations are reversed then the correct extraction ratescan be even higher. Finally, since the mouth region extraction is not based ondetermining the lip-cut there is no restriction on the mouth mimics.

562 E.A. Ince and S.A. Ali

Acknowledgment

The work presented herein is an outcome of research carried out under SeedMoney Project EN-05-02-01 granted by the Research Advisory Board of EasternMediterranean University.

References

1. Rein-Lien, H., Abdel-Mottaleb, M.: Face detection in colour images, IEEE Trans. onPattern Analysis and Machine Intelligence, Vol. 24, No:5, pp. 696-706, May 2002.

2. Ming-Hsuan, Y., Kriegnam, D. J., Ahuja, N.: Detecting faces in images: a survey,IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 24, No:1, pp.34-58, January 2002.

3. Alattar, A. M., Rajala, S. A.:Facial Features Localization In Front View Head AndShoulders Images, IEEE International Conference on Acoustics, Speech, and SignalProcessing, Vol. 6, pp. 3557-3560, March 1999.

4. Rein-Lien, H., Abdel-Mottaleb, M., Jain, A. K.:Face detection in color images,Tech. Report MSU-CSE-01-7, Michigan State University, March 2001.

5. Ince., E. A., Kaymak., S., Celik., T.: Yuzsel Oznitelik Sezimi Icin Karma BirTeknik, 13. IEEE Sinyal Isleme ve Iletisim Uygulamaları Kurultayı, pp. 396-399,May 2005.

6. Hu, M., Worrall, S., Sadka A. H., Kondoz, A. M.: Face Feature Detection AndModel Design For 2D Scalable Model-Based Video Coding, International Confer-ence on Visual Information Engineering, pp. 125-128, July 2003.

7. Sung, K., Poggio, T.: Example based Learning for View-based Human Face Detec-tion, C.B.C.L. , Paper No: 112, MIT, 1994.

8. Moghaddam, B., Pentland, A.: Face Recognition using View-Based and ModularEigenspaces, IEEE Transactions on Pattern Analysis and Machine Intelligence,Vol.20, No:1, pp. 23-38, January 1998.

9. Rowley, H., Baluja, S., Kanade, T.: Neural Network Based Face Detection, IEEETrans. on Pattern Analysis and Machine Intelligence, Vol. 24, No:1, pp. 34-58,January 2002.

10. Adams, R., Bischof, L.: Seeded Region Growing, IEEE Trans. on Pattern Analysisand Machine Intelligence, Vol. 16, No:6, pp. 641-647, June 1994.

11. Heisele, B., Ho, P., Wu, J., Poggio, T.: Face Recognition: Component-based versusGlobal Approaches, Computer Vision and Image Understanding 91, 6-21, February2003.

12. Phillips P. J., Wechsler H., Huang J., Rauss P.: The FERET database and evalu-ation procedure for face recognition algorithms, Image and Vision Computing J.,Vol.16, No.5, pp. 295-306, 1998.

13. Phillips P. J., Moon H., Rizvi S. A.,Rauss P. J.: The FERET Evaluation Methodol-ogy for Face Recognition Algorithms, IEEE Trans. Pattern Analysis and MachineIntelligence, Vol. 22, pp. 1090-1104, 2000.

14. Terrillon, J. C., Shirazi, M. N., Akamatsu, S.: Comperative performance of differentskin chrominance models and chrominance spaces for the automatic detection ofhuman faces in color images, Proc. IEEE Int. Conf. Automatic Face and GestureRecognition, pp. 54-61, 2000.

15. Chiang, C-C., Tai, W-K., Yang, M-T., Huang, Y-T., Huang, C-J.: A novel methodfor detecting lips, eyes and faces in real time, Real-Time Imaging 9 , Vol. 9, pp.277-287, 2003.