9
A Binary Tree for Probability Learning in Eye Detection Junwen Wu, Mohan M. Trivedi Computer Vision and Robotics Research lab ECE Department, UC San Diego La Jolla, CA 92093, USA [email protected] Abstract Human eyes, as the most salient features on face images, are highly statistically structured. Some features are highly de- pendent while some others are nearly independent. In this paper, we propose to use a binary tree to model the statisti- cal structure. Features in the nodes from the lower levels are more dependent by excluding the conditional independent features. Therefore, probability learning can be more accu- rate. The tree is built in a top-down fashion. Subsets with negligible conditional independency are found by cluster- ing using mutual information. More dependent subsets are obtained by excluding one of the most conditional indepen- dent subsets. Those more dependable subsets are passed on to the corresponding subtrees. The procedure is repeated until all the features for the current node are with suffi- ciently high dependency. Each subtree models the statistics of the subset from the parent node. We use a non-parametric probability model to learn the class conditional probabili- ties classes according to the binary tree. The whole feature distribution is learned in a recursive form. Bayesian crite- rion is used thereafter. Parameters are learned experimen- tally from ROC curves. We apply the detector for images with pure face as well as images with complex background. The experimental results give a high detection rate while keeping the false alarm rate low. 1. Introduction Statistical model for object detection is a topic that has re- ceived considerable interest. There are two different ways to view the problem. One is inspired from the problem of what makes a good feature [1]. Efforts are made to- wards finding the best feature descriptor for detection. Here best is in terms of the discriminant ability with respect to the negative samples. Another is from a better local statis- tics modeling [2, 3, 4, 5]. The motivation is directly from Bayesian criterion: if having the true distribution function, Bayesian criterion can provide us an optimal classification. The problem is how to find the statistical structure for a bet- ter probability learning. In [3], a detection approach based on clustering is proposed. However, in this approach, the clusters are regarded as independent, which is a problem- atic assumption. In [5], a restricted Bayesian network is proposed. In this paper, the concept of learning feature de- pendency is clearly stated, which explained why compo- nent based detection and recognition algorithms can achieve good performance. However complicated learning proce- dure is involved, which is computational expensive. In our paper, we present an alternative method by using a binary tree representation. The feature dependency is studied in a structured way by using a tree representation, in a top- down fashion. The tree structure separates the conditional independent features into different subtrees, while more de- pendent features are kept in the same subtree. It actually enables us to study the local statistical substructure of an object in a coarse-to-fine fashion, each subtree explains the statistics for certain part/component. The training compu- tational cost is low. Specifically, we apply the algorithm for eye detection, a good performance can be achieved. The problem of detecting human eyes has attracted a considerable interest in computer vision society. Many ef- forts have been addressed to capture the essential physical and emotional information from eyes. In intelligent vehi- cle systems, eye gaze and the motion of eye pupil provide important information for fatigue analysis [6] [7] [8]. In face detection and recognition systems, eyes can provide the richest identity information [9]. Eye detection and tracking is also a very promising approach for focus analysis in a intelligent conference room [10]. In general, there are two main approaches for eye detection: image based approaches [6] [7] [9] [10] and IR camera based approach [8] [11]. Tra- ditional image based approaches take the images from the optical video camera as the input. Eyes are detected based on the intensity distribution. These approaches assume that eyes have different appearance from other facial parts and background. This type of approaches do not require spe- cial illumination setup, however, some predefined models (color, geometry, templates, etc.) are required. These mod- els are usually application dependent. For IR camera based approach, the physiological properties of the eyes is em- 1

A Binary Tree for Probability Learning in Eye Detectioncvrr.ucsd.edu/TSWG/papers/Wu_Eye_detection2_TechReport...A Binary Tree for Probability Learning in Eye Detection Junwen Wu, Mohan

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Binary Tree for Probability Learning in Eye Detectioncvrr.ucsd.edu/TSWG/papers/Wu_Eye_detection2_TechReport...A Binary Tree for Probability Learning in Eye Detection Junwen Wu, Mohan

A Binary Tree for Probability Learning in Eye Detection

Junwen Wu, Mohan M. TrivediComputer Vision and Robotics Research lab

ECE Department, UC San DiegoLa Jolla, CA 92093, USA

[email protected]

Abstract

Human eyes, as the most salient features on face images, arehighly statistically structured. Some features are highly de-pendent while some others are nearly independent. In thispaper, we propose to use a binary tree to model the statisti-cal structure. Features in the nodes from the lower levels aremore dependent by excluding the conditional independentfeatures. Therefore, probability learning can be more accu-rate. The tree is built in a top-down fashion. Subsets withnegligible conditional independency are found by cluster-ing using mutual information. More dependent subsets areobtained by excluding one of the most conditional indepen-dent subsets. Those more dependable subsets are passed onto the corresponding subtrees. The procedure is repeateduntil all the features for the current node are with suffi-ciently high dependency. Each subtree models the statisticsof the subset from the parent node. We use a non-parametricprobability model to learn the class conditional probabili-ties classes according to the binary tree. The whole featuredistribution is learned in a recursive form. Bayesian crite-rion is used thereafter. Parameters are learned experimen-tally from ROC curves. We apply the detector for imageswith pure face as well as images with complex background.The experimental results give a high detection rate whilekeeping the false alarm rate low.

1. IntroductionStatistical model for object detection is a topic that has re-ceived considerable interest. There are two different waysto view the problem. One is inspired from the problemof what makes a good feature[1]. Efforts are made to-wards finding thebestfeature descriptor for detection. Herebest is in terms of the discriminant ability with respect tothe negative samples. Another is from a better local statis-tics modeling [2, 3, 4, 5]. The motivation is directly fromBayesian criterion: if having the true distribution function,Bayesian criterion can provide us an optimal classification.The problem is how to find the statistical structure for a bet-ter probability learning. In [3], a detection approach based

on clustering is proposed. However, in this approach, theclusters are regarded as independent, which is a problem-atic assumption. In [5], a restricted Bayesian network isproposed. In this paper, the concept of learning feature de-pendency is clearly stated, which explained why compo-nent based detection and recognition algorithms can achievegood performance. However complicated learning proce-dure is involved, which is computational expensive. In ourpaper, we present an alternative method by using a binarytree representation. The feature dependency is studied ina structured way by using a tree representation, in a top-down fashion. The tree structure separates the conditionalindependent features into different subtrees, while more de-pendent features are kept in the same subtree. It actuallyenables us to study the local statistical substructure of anobject in a coarse-to-fine fashion, each subtree explains thestatistics for certain part/component. The training compu-tational cost is low. Specifically, we apply the algorithm foreye detection, a good performance can be achieved.

The problem of detecting human eyes has attracted aconsiderable interest in computer vision society. Many ef-forts have been addressed to capture the essential physicaland emotional information from eyes. In intelligent vehi-cle systems, eye gaze and the motion of eye pupil provideimportant information for fatigue analysis [6] [7] [8]. Inface detection and recognition systems, eyes can provide therichest identity information [9]. Eye detection and trackingis also a very promising approach for focus analysis in aintelligent conference room [10]. In general, there are twomain approaches for eye detection: image based approaches[6] [7] [9] [10] and IR camera based approach [8] [11]. Tra-ditional image based approaches take the images from theoptical video camera as the input. Eyes are detected basedon the intensity distribution. These approaches assume thateyes have different appearance from other facial parts andbackground. This type of approaches do not require spe-cial illumination setup, however, some predefined models(color, geometry, templates, etc.) are required. These mod-els are usually application dependent. For IR camera basedapproach, the physiological properties of the eyes is em-

1

Page 2: A Binary Tree for Probability Learning in Eye Detectioncvrr.ucsd.edu/TSWG/papers/Wu_Eye_detection2_TechReport...A Binary Tree for Probability Learning in Eye Detection Junwen Wu, Mohan

ployed. It utilizes IR illumination. The pupils arebright-ened upusing the red-eye effect. Good performance hasbeen reported. The success of the IR approaches dependson the subjects, the distance to the camera and external illu-mination. In real world settings, the illumination is hard tocontrol, especially for outdoor environment. This limits theapplications of the IR based approaches. In this paper, wepropose to use our object detection framework to solve theeye detection problem .

We use a clustering method based on the pairwise mu-tual information between features to separate less dependentfeatures into different subsets. Statistics of a feature sub-set are represented by the combination of some node and itssubtrees (for non-leaf node). Subtrees from one parent nodemodel the statistics for two overlapped subsets. where thenon-overlapping parts are the current most conditional inde-pendent subsets. Likelihood is estimated by non-parametricmethod, based on the statistical structure represented in thetree. The tree is built in atop-downfashion and the ob-ject’s distribution is learnedbottom-up. Eyes are detectedby Bayesian criterion. The following section 2 describesthe details of the algorithm, including feature clustering,probability learning and parameter estimation. In section3 experimental evaluation is presented. Section 4 concludesthe paper.

2. Constructing the Binary TreeIn the proposed algorithm, at each level, the features arepartitioned into subsets according to their statistical de-pendency. Suppose the current features set isXS andwe can segment it into subsetsXS1 ,XS2 , · · · ,XSlk

accord-ing to their mutual dependency. If there are two subsetswhose mean pairwise mutual information is less than a cer-tain thresholdδ, the node is still not statistically compactenough. Therefore, more decomposition is still needed.Without loss of generality, we denote these two subsets asXSl1 andXSl2 . If δ is low enough, the dependency betweenXSl1 andXSl2 can be ignored. However, both subsets stillhave sufficient dependency with the remaining subsets. ThismeansXSl1 andXSl2 can be considered as conditional in-dependent with respect to the union of the remaining sub-sets∪i 6=1,2XSli

. We construct the binary tree by keeping∪i 6=1,2XSli

as the parent node and pass subsets∪i 6=1XSli

and∪i 6=2XSlion to the subtrees. The two subsets passed on

to the subtrees are actuallyXS \ XSl1 andXS \ XSl2 . Theprocedure is shown in Fig. 1.

The tree structure describes the mutual dependency be-tween the features. We do not have a dependency model forthe non-eye samples due to the lack of statistical structures.It is reasonable to use the same dependency tree for negativeexamples. In such situation, fitting the negative sample intothe model tells us the deviation from the positive pattern. Inthe following sections, we will discuss the details of the tree

Figure 1:Illustration for constructing the binary tree.

constructing and the likelihood learning.

2.1 Finding the subsets with the least depen-dency

Every class of objects will present some similar structures.The structure information confines the dependency betweenfeatures. Features from different substructures are less de-pendent. Mutual information is a metric for evaluating themutual dependency between features. Mutual informationis defined in terms of entropies:

I(X1,X2) = H(X1) + H(X2)−H(X1,X2); (1)

whereH(Xi) is the entropy of random variableXi, i =1, 2, andH(X1,X2) is the joint entropy. They are definedas follows:

H(Xi) = −∑

P (Xi) log P (Xi); (2)

H(X1,X2) = −∑∑

P (X1,X2) log P (X1,X2). (3)

Mutual information is actually the KL divergence betweenthe joint probability mass functionP (X1,X2) and theproduct of the marginal probability mass functionP (X1)andP (X2):

I(X1,X2) =∑

X1

X2

P (X1,X2) logP (X1,X2)

P (X1)P (X2). (4)

This definition shows that mutual information measures theamount of information one random variable contains aboutanother, or mutual dependency. Its maximal value will beattained if there exists a bijective mapping between the val-ues ofX1 andX2. On the other extreme,I(X1,X2) dropsto zero if X1 andX2 are statistically independent. Moti-vated from this, we use a clustering scheme adapted from

2

Page 3: A Binary Tree for Probability Learning in Eye Detectioncvrr.ucsd.edu/TSWG/papers/Wu_Eye_detection2_TechReport...A Binary Tree for Probability Learning in Eye Detection Junwen Wu, Mohan

the traditional K-means clustering algorithm. Instead ofthe Euclidean distance, we use a decreasing function ofthe pairwise mutual information as the distance measure.Specifically, we use the reciprocal of the mutual informa-tion. Other decreasing function can also be used. However,traditional K-means uses the centroids of the clusters forupdating. Since mutual information is only defined over theset ofXS , centroids of the points becomes meaningless. In-stead, for each clusteringXi, we use the feature with thesmallest mean distance, i.e. with the largest mean pairwisemutual information, to all the other features inXi in the up-dating step. Meanwhile, since the convergence property ofthe traditional K-means holds only for numerical attributes,we need to modify the stopping criterion to ensure con-vergence. The clustering procedure will stop if no pointschange cluster label. This clustering method can guaran-tee that the mean pairwise mutual information between sub-sets increases. As in K-means clustering, the number of theclusters should be provided in advance. The number of clus-ters is selected by iteratively increasing the number until thelowest mean pairwise mutual information between differentsubsets achieves the thresholdδ. The clustering procedurecan be summarized as follows:

1. Estimate the mutual information for each pair of fea-tures in XS by their empirical distribution, usingequation 4. Calculate the reciprocal of the mutual in-formation as the distance between points:

D(Xi,Xj) =1

I(Xi,Xj); (5)

2. Forc = 3, · · · , K, perform clustering by the followingsteps:

(a) Randomly select c pointsXcs from XS as the

seeds forc clusters;

(b) Initialize errorε with a large number;

(c) While ε > 0:

i. Assign each pointXi into correspondingcluster according to the nearest neighbor cri-terion, usingD(Xi,Xc

s):νi = arg min

cD(Xi,Xc

s); (6)

ii. Count the number of points that change la-bels;

iii. For each cluster, find the point with the min-imal mean distance with all the other pointsin the same cluster, update the seed to thispoint;

(d) Calculate the mean pairwise mutual informationbetween clusters;

(e) Find the clusters with the minimal between-cluster mean pairwise mutual informationmin Ic1,c2 ;

Figure 2: Illustration of the tree structure decomposition. Firsttwo levels (3 nodes) for the left eye are shown. For each node,there are two images. The top one shows the clustering results andthe bottom one highlights the two most independent subsets in blueand cyan, other features are in grey. In the second level, the blackregion shows the feature subset excluded from the previous level.

3. If min Ic1,c2 < δ, stop;

4. If c = K andmin Ic1,c2 > δ, the subset is compactenough, which means this is a leaf node.

This procedure gives us a partition for the current set of fea-tures according to their pairwise mutual information. If wecan find the subsets such that the mean pairwise mutual in-formation between them is low enough, these two subsetscan be regarded as conditional independent. For this pur-pose, the mean pairwise mutual information between everytwo subsets is examined. Suppose for subsetXS1 , thereis/are subset(s)XSm which satisfy:

1|XS1 ||XSm |

Xi∈XS1

Yj∈XSm

I(Xi,Yj) < δ,

m = 2, · · · ,M ; (7)

thenA = XS1 andB = ∪Mm=2XSm are the two least inde-

pendent subsets we selected. As we have introduced before,the union of all the other subsets,C = ∪m 6=1,···,MXSm iskept as the current node. Its unions with these two condi-tional independent subsets,A andB are individually passedon to the subtrees. Fig. 2 shows the results of the tree struc-ture decomposition at the first two levels (for the left eye).

3

Page 4: A Binary Tree for Probability Learning in Eye Detectioncvrr.ucsd.edu/TSWG/papers/Wu_Eye_detection2_TechReport...A Binary Tree for Probability Learning in Eye Detection Junwen Wu, Mohan

2.2 Learning the Likelihoods

The binary tree groups the most dependent features togetherand divides the irrelevant features into different subsets. Af-ter getting these dependency relationship, the distributioncan be estimated accordingly. Using the notationsA,B, Cas defined above, the joint probability forXS is:

P (XS) = P (A,B, C) = P (A,B|C)P (C). (8)

Ignoring the conditional dependency between sets A and Bbased on C, we get:

P (XS) = P (A|C)P (B|C)P (C)

=P (A,C)P (B, C)

P (C). (9)

Therefore, to describe the probability ofXS , we need to es-timate the joint probability forA∪C, B ∪C andC, whereC is the current non-leaf node. The procedure is repeateduntil reaching leaf nodes. The probability for feature sub-setsA∪C andB ∪C are estimated by repeatedly applyingequation 8 and 9. This recursion gives the final probabilityas follows (for both eye class and non-eye class):

P (XS |eye)

=

∏Si∈Ωleaf

P (XSi

, Parent[XSi]|eye)

∏Si∈Ωleaf

P (XSi|eye)

, (10)

P (XS |non− eye)

=

∏Si∈Ωleaf

P (XSi

, Parent[XSi]|non− eye)

∏Si∈Ωleaf

P (XSi|non− eye)

,(11)

whereΩleaf is the set composed by leaf nodes,Ωleaf isthe set composed by non-leaf nodes andParent[XSi

] is theparent node for nodeXSi

.Equation 10 and 11 indicate that only two kinds of joint

probability needs to be estimated:

• For feature sets represented by the leaf node, we needto estimate the joint probability for the union of the thecurrent node and its parent;

• For sets represented by the non-leaf node, we need toestimate the joint probability for features in the currentnon-leaf node.

In both types of sets, there is a high dependency betweenthe component variables, causing ill-conditioned covariancematrices. Hence, we refrain from directly using Gaussianmixture to model the probability. Although by PCA-typesubspace projection, we can transform the data into a lowerdimensional space with the components less dependent, itis not clear whether the information lost from the projection

is really redundant for classification. Therefore, we use anon-parametric way to estimate the probability instead. Fortraditional Parzen-window probability estimation method,the probability is represented by:

P (X = x|eye) =1N

N∑

i=1

ψ(x−xi; σ2I,xi ∈ XeyeTrain),

(12)whereψ(•) is a normal function centered atxi and has a co-variance matrixσ2I; XeyeTrain is the set of eye trainingsamples. Usually the kernel is fix-shaped, which means ittakes the assumption that for every sample, the local struc-ture is the same. However, in high dimensional space, theassumption can hardly hold due to the sparseness of thesamples comparing with the dimensionality. Furthermore,in the vicinity of certain point, the true mass function ismore likely to be concentrated along a few dominant di-rections, while samples are not enough to capture the localstructure. Considering this, we slightly adapt the estima-tion method by introducing a non-isotropic covariance ma-trix estimated from samples. We incorporate the informa-tion from the samples neighborhood. At each dimensionj, we calculate the variance of the samples, denoted asσj .Also for every samplexi, we find itsK nearest neighborsxik

, k = 1, · · · ,K. The distance between them along di-mensionj is dj(i, ik). The covariance matrix used in thekernel function related with samplexi as:

Σi = diag(σ2i1, · · · , σ2

iN ); (13)

where

σ2ij =

∑Kk=1K(xik

j , xij)d

2j (i, ik)

∑Kk=1K(xik

j , xij)

; (14)

xi = [xi1, x

i2, · · · , xi

N ];xik = [xik

1 , xik2 , · · · , xik

N ];

andK(xikj , xi

j) is a kernel function to evaluate the influencefrom neighboring points. In our implementation, we use aGaussian forK(xi

j , xikj ):

K(xij , x

ikj ) = N (xik

j ;xij , σj).

This is actually a simplified version of the “manifoldParzen window” method introduced in [12], which uses afull covariance matrix while we use a diagonal matrix forsimplification. Now the kernel function for probability esti-mation related to samplexi is:

ψ(x− xi,Σi|eye)

=1√

2π∏N

j=1 σij

exp−12

N∑

j=1

(xj − xij)

2

2 σ2ij

. (15)

From equation 8- 15, we can get the conditional prob-ability P (X = x|eye). Probability for non-eye examples

4

Page 5: A Binary Tree for Probability Learning in Eye Detectioncvrr.ucsd.edu/TSWG/papers/Wu_Eye_detection2_TechReport...A Binary Tree for Probability Learning in Eye Detection Junwen Wu, Mohan

P (X = x|non− eye) can be estimated similarly. FromP (X = x|eye) andP (X = x|non− eye), Bayesian de-cision theory gives:

x =

eye if P (X=x|eye)

P (X=x|non−eye) > η;non− eye otherwise.

However, it is hard to find a complete representative train-ing set for non-eye samples. There may exist such non-eyetesting sample whose probability estimated from the aboveprocedure is small due to the lack of similar pattern in thetraining samples. Sometime this inaccurate estimation willcause P (X=x|eye)

P (X=x|non−eye) to falsely exceed the threshold, in-creasing the false alarm rate. To avoid this, we also considerthe absolute class conditional probability of beingeye. Thedecision criterion then becomes:

x =

eye if P (x|eye)

P (x|non−eye) > η&P (x|eye) > η′;non− eye otherwise.

2.3 Determineη and η′ from the ROC curves

In above equation, we need to provide two thresholdsη andη′. From Bayesian decision theory, the optimal value forη should be the ratio of the priors. However, for real data,the priors are unknown. Therefore, we determineη andη′experimentally from the data. We use ROC curves for de-termine the optimalη andη′.

Fig. 3 gives some examples of the training and testingsamples used. The top row in figure 3 shows some trainingsamples for eye and non-eyes. The bottom row in figure 3shows some testing samples. We have 2011 positive exam-

Figure 3:Training and testing examples. The top row are trainingeye samples and non-eye samples respectively; and the bottom roware the corresponding testing examples. The size of the samples is20× 40.

ples and 1019 negative examples for training; and another350 positive examples and 1307 negative examples as thetesting set. The positive samples from both training andtesting sets are from the left eye. Training samples and test-ing samples are drawn independently from different data-base. Training samples are from the FERET database [13]and testing samples are from the Caltech face database.

Having the testing sets, we examine the ROC curves withone parameters fixed at different values. Assumingη is

fixed, changingη′ gives one ROC curve. A set of ROCcurves will be generated withη fixed at different values.η is selected as the one with the best ROC characteristic,which means at a certain false alarm rate, it can achieve thehighest detection rate. We use the curve that gives the high-est detection rate with 0.1 false alarm as the best one. Therange ofη′ can also be determined from the best ROC curve.We use the similarly way to refine the bestη′. We fix η′ atsome values around the value from previous step, and de-termine the best ROC curve in the similar way. Figure 4(a)and figure 4(b) give the ROC curves for determineη andη′individually from the above training and testing sets . Ac-cording to the ROC curves, we setη = 1 andη′ = 3×10−6.

3. Experimental EvaluationIn this section, we will discuss the performance of the al-gorithm we proposed. We evaluate the performance fromthree aspects. In our implementation, to avoid loss of lo-calization ability from transformed domain representation,the original image space is used. Therefore the features aresimply the pixel intensity.

3.1 Comparison with Two IntermediateSteps

First we compare our algorithm with two intermediatemethods. The same training sets and testing sets are used.In the first one we model the object probability without par-titioning into parts. The second is to partition the objectinto different independent and non-overlapped subsets. Thesame clustering scheme is used, which is clustering by pair-wise mutual information. However here we do not considerthe dependency between different subsets, which means forpartition resultsXS1 , · · · ,XSL , the probability is:

P (XS1 , · · · ,XSL |eye) =L∏

i=1

P (XSi |eye); (16)

P (XS1 , · · · ,XSL |non− eye) =L∏

i=1

P (XSi |non− eye);

(17)This essentially shares the similar idea with the work in [3].The performance is evaluated by ROC curves. Fig. 5 showsthe ROC curves. These curves are drawn with a fixedopti-mal η′. The comparison shows that partition the featuresinto dependent subsets has a great contribution to the perfor-mance. Without the partition step, the probability learnedcan hardly provide useful information for classification atall. As expected, if we don’t consider the dependency be-tween the subsets during the partition procedure, the perfor-mance is inferior than the alternative.

5

Page 6: A Binary Tree for Probability Learning in Eye Detectioncvrr.ucsd.edu/TSWG/papers/Wu_Eye_detection2_TechReport...A Binary Tree for Probability Learning in Eye Detection Junwen Wu, Mohan

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Tru

e P

ositi

ve R

ate

False Alarm Rate

η=0.1

η=0.5

η=1

η=1.5

(a) The set of ROC curves withη fixed at differentvalues.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Tru

e P

ositi

ve R

ate

False Alarm Rate

η′=0

η′=3e−006

η′=9e−006

η′=1.5e−005

(b) The set of ROC curves withη′ fixed at differentvalues.

0 0.2 0.4 0.6 0.8 10.75

0.8

0.85

0.9

0.95

1

Tru

e P

ositi

ve R

ate

False Alarm Rate

η′=0

η′=3e−006

η′=9e−006

η′=1.5e−005

(c) Zoomed ROC curves in 4(b) for better view.

Figure 4:ROC curves used to determineη andη′.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Alarm Rate

Tru

e P

ositi

ve R

ate

Using whole objectPartition by binary treePartition without tree

Figure 5:ROC curves for the algorithm proposed compared withtwo intermediate steps.

Figure 6:Good detection results. The detections are marked byred circles.

Figure 7: Correct eyes are detected, while detection results areoff-centered. The detections are marked by red circles.

6

Page 7: A Binary Tree for Probability Learning in Eye Detectioncvrr.ucsd.edu/TSWG/papers/Wu_Eye_detection2_TechReport...A Binary Tree for Probability Learning in Eye Detection Junwen Wu, Mohan

Figure 8: False alarm and miss detection. The detections aremarked by red circles.

3.2 Performance Evaluation over Face Im-ages

In this section, we study the ability of being able to dis-criminate eyes from other facial parts. We have 317 imageswhich only contain faces. These images are from GRAYFERET database. Experimental results show that our eyedetector can achieve a high detection rate 92.43% with afalse alarm rate 7.26%. Interestingly, our algorithm caneven learn the subtle difference between the left eye andright eye. As we mentioned before, our training set onlycontains left eyes. The detection results show that only5.99% images give results with the right eye being detected.The detection results also show a good localization ability.84.64% of the correct detections give results accurately cen-tered at the pupils. Fig. 6 gives some examples of the accu-rate detection results; Fig. 7 gives some examples that theeyes are detected, but the detection results are off-centered(not centered at the pupils); Fig. 8 gives bad detection re-sults, including false alarm and mis-detection.

3.3 Detection in Complex Background

We also challenged the algorithm with a much harder set-ting containing images with complex background. Imagesfrom Caltech face database were used. In each image, thereis only single person. To save computations, we only ex-amine the regions with sufficient texture. We evaluate thetexture by applying Laplacian filter. The regions with rea-sonably low response will not be considered for detection.Our algorithm gives fairly good performance. Fig. 9 givessome examples of the results. The red circles are the de-tection results. Fig. 10 gives an example that the algorithmfails. We use different colored circles to highlight the dif-ference.

3.4 Preliminary Study on Driver Data

Some preliminary research on data from real driving sce-narios was also done. The data were collected from a videocamera mounted inside the car, facing to the driver’s facefrom the right side. Some results on consecutive frames areshown in Fig.11. Results for frames 630, 685, 708, 754

(a) Example 1.

(b) Example 2.

(c) Example 3.

Figure 9: Examples for detection in complex background. Redcircles show the detections.

Figure 10:Example that the algorithm fails. We use cyan circlesfor differentiation.

7

Page 8: A Binary Tree for Probability Learning in Eye Detectioncvrr.ucsd.edu/TSWG/papers/Wu_Eye_detection2_TechReport...A Binary Tree for Probability Learning in Eye Detection Junwen Wu, Mohan

Figure 11: Examples for preliminary research on real drivingdata. Red circles marked the detection.

are shown. The performance is inferior than the data fromCaltech database. Although the right eyes are correctly de-tected, there are many false detections. One hypothesisabout the false alarm is because of the low SNR. Differ-ent from the data in FERET and Caltech database, the datais quite noisy. Another reason might be the variance of thedriver’s head pose. However, we also noticed that the falsedetections are not consistent over frames, also few false de-tections are present on face regions. Therefore, some sim-ple tracking algorithm could be used to eliminate them, aswell as some simple face detection method as preprocess-ing. This indicates the direction of our future research work.

4. Summary and ConclusionsIn this paper, we propose to use a binary tree to model thestatistical structure of human eyes. At each level, featuresare clustered into subsets according to their pairwise mutualinformation, and subtrees are constructed by excluding themost independent subset respectively. The procedure is re-peated until for all leaf nodes, features have sufficient highmutual information. A non-parametric probability model isused to learn the conditional probabilities for both classesaccording to the binary tree. Bayesian decision criterion isused thereafter and the ratio is learned experimentally fromROC curves. We apply the detector for images with pureface as well as images with complex background. The ex-perimental results give a high detection rate and low falsealarm rate. For pure face image, 92.43% detection rate wasobtained, while the false alarm rate was 7.26%. The modelis also able to differentiate left eye from right eye with ahigh accuracy. Also, it has a good localization ability. Pre-liminary results on video data collected from real drivingscenario shows its potential applications in intelligent vehi-cle system. However, due to the non-parametric estimationprocedure involved for probability estimation, the computa-tional cost is high. A better probability estimation method

such as [14] may be able to solve this. A better searchscheme may also be desired as well to accelerate the detec-tion.

Acknowledgments

This research was support in part by the UC Discovery Pro-gram and the Technical Support Working Group of the USDepartment of Defense. The authors also acknowledge thesupport and assistance of their colleagues from the CVRRLaboratory.

References

[1] W. Richards and A. Jepson,What makes a good feature? Pro-ceedings of the 1991 York Conference on Spatial Vision inHumans and Robots, pp.89-125, Jan. 1994.

[2] B, Moghaddam and A. Pentland,Probabilistic Visual Learn-ing for Object Representation, IEEE Transactions on PatternAnalysis and Machine Intelligence archive Volume 19 , Issue7, pp. 696 - 710, July 1997.

[3] T. D. Rikert, M. J. Jones and P. Viola,A Cluster-Based Statis-tical Model for Object Detection, Proceedings of the Interna-tional Conference on Computer Vision-Volume 2 - Volume 2Page: 1046 Year of Publication: 1999

[4] H. Schneiderman and T. Kanade,Object Detection Using theStatistics of Parts, International Journal of Computer Visionarchive Volume 56 , Issue 3, pp. 151 - 177, February-March2004.

[5] H. Schneiderman,Learning a Restricted Bayesian Networkfor Object Detection, IEEE Conference on Computer Visionand Pattern Recognition, IEEE, June 2004.

[6] H. Veeraraghavan and N. P. Papanikolopoulos.DetectingDriver Fatique through the Use of Advanced Face MonitoringTechniques, University of Minnesota, center for transporta-tion studies, Technical Report, 2001.

[7] X. Liu, F. Xu and K. Fujimura.Real-Time Eye Detectionand Tracking for Driver Observation Under Various LightConditions, IEEE Intelligent Vehicle Symposium, Versailles,France, June 18-20, 2002.

[8] Q. Ji and X. Yang,Real-time eye, gaze, and face pose trackingfor monitoring driver vigilance, Real-Time Imaging, Volume8 , Issue 5, Pages: 357 - 377, October 2002.

[9] R.-L. Hsu, M. Abdel-Mottaleb, and A. K. Jain,Face detectionin color images, IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 24, no. 5, pp. 696-706, May 2002.

[10] Rurainsky and P. Eisert,Template-based Eye and Mouth De-tection for 3D Video Conferencing, Proc. International Work-shop on Very Low Bitrate Video VLBV 2003, Madrid, Spain,pp. 23-31, September 2003.

8

Page 9: A Binary Tree for Probability Learning in Eye Detectioncvrr.ucsd.edu/TSWG/papers/Wu_Eye_detection2_TechReport...A Binary Tree for Probability Learning in Eye Detection Junwen Wu, Mohan

[11] K. Nguyen et. al.,Differences in the Infrared Bright PupilResponse of Human Eyes, Proc. ETRA 2002 - Eye Track-ing Research and Applications Symposium. New York, ACM.2002, p. 133-56, March 2002.

[12] P. Vincent and Y. Bengio,Manifold Parzen Windows, In Ad-vances in Neural Information Processing Systems 15,2003.

[13] P. J. Phillips and H. Moon and S. A. Rizvi and P. J. Rauss,The FERET Evaluation Methodology for Face RecognitionAlgorithms, IEEE Trans. Pattern Analysis and Machine In-telligence, Volume 22, October 2000, pp. 1090-1104.

[14] M. Girolami and C. He.Probability Density Estimation fromOptimally Condensed Data Samples, IEEE Transactions onPattern Analysis and Machine Intelligence (PAMI), 2003,25(10): 1253 - 1264.

9