Evolutionary adaptive eye tracking for low-cost …...Evolutionary adaptive eye tracking for low-cost human computer interaction applications Yan Shen Hak Chul Shin Won Jun Sung Sarang

Evolutionary adaptive eye tracking forlow-cost human computer interactionapplications

Yan ShenHak Chul ShinWon Jun SungSarang KhimHonglak KimPhill Kyu Rhee

Downloaded From: https://www.spiedigitallibrary.org/journals/Journal-of-Electronic-Imaging on 21 Feb 2020Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

Evolutionary adaptive eye tracking for low-costhuman computer interaction applications

Yan ShenHak Chul ShinWon Jun SungSarang KhimHonglak KimPhill Kyu RheeInha University

235 Yong-Hyun Dong, Nam KuIncheon, Republic of KoreaE-mail: [email protected]

Abstract. We present an evolutionary adaptive eye-tracking frame-work aiming for low-cost human computer interaction. The main focusis to guarantee eye-tracking performance without using high-costdevices and strongly controlled situations. The performance optimiza-tion of eye tracking is formulated into the dynamic control problem ofdeciding on an eye tracking algorithm structure and associatedthresholds/parameters, where the dynamic control space is denotedby genotype and phenotype spaces. The evolutionary algorithm isresponsible for exploring the genotype control space, and thereinforcement learning algorithm organizes the evolved genotypeinto a reactive phenotype. The evolutionary algorithm encodes aneye-tracking scheme as a genetic code based on image variationanalysis. Then, the reinforcement learning algorithm defines internalstates in a phenotype control space limited by the perceived geneticcode and carries out interactive adaptations. The proposed methodcan achieve optimal performance by compromising the difficulty inthe real-time performance of the evolutionary algorithm and thedrawback of the huge search space of the reinforcement learningalgorithm. Extensive experiments were carried out using webcamimage sequences and yielded very encouraging results. The frame-work can be readily applied to other low-cost vision-based humancomputer interactions in solving their intrinsic brittleness in unstableoperational environments. © The Authors. Published by SPIE under aCreative Commons Attribution 3.0 Unported License. Distribution orreproduction of this work in whole or in part requires full attributionof the original publication, including its DOI. [DOI: 10.1117/1.JEI.22.1.013031]

1 IntroductionIn the last three decades, many eye-tracking approaches havebeen proposed aimed at diverse objectives and applica-tions.1,2 Automatic eye tracking has been employed inmany application areas, since it is strongly connected withhuman perception, attention, and cognitive states. Eye-tracking technology can be employed in unobtrusive andhands-free human computer interaction (HCI) as an effectivetool of interaction and communication between computersand people. Even though a lot of effort has been investedin developing the technology, the problem of eye trackingis still very challenging, owing to changing operation

environments with varying types of illumination, viewingangle, scale, individual eye shape, and jittering.3 Most suc-cessful eye-tracking techniques in commercial areas poseeither the requirement of a high-cost image capturing device(e.g., camera and lens) or very limited operation within astrongly controlled situation.4,5 Recently, much interest ineye-tracking technology is being generated in the areas ofHCI for web usability, advertising, smart TV, and mobileapplications. However, the high cost or the limitation of con-ventional eye-tracking techniques can hardly be employed inlow-cost commercial applications. Without lighting controland controlled situations, more available and general eye-tracking technology that reduces costs as well as simplifiesthe complicated initial user setup is necessary for a success-ful HCI application. Few researchers have tackled the moregeneral vision technology for eye tracking in a looselycontrolled environment with low-cost equipment.6,7 Mosttechniques, however, are far from practical in current andcoming vision-based HCI applications due to their intrinsicbrittleness in changing image-capturing environments.

One promising direction is an adaptive selection of differ-ent eye-tracking algorithm structures and the control ofassociated thresholds/parameters considering environmentchanges. However, the decision of an optimal algorithmstructure with its adaptable thresholds/parameters is a verycomplicated problem in the area of image processing andcomputer vision in general.8 Some techniques for adaptivethresholds and flexible algorithm structures can be foundin evolutionary algorithm (EA) approaches such as geneticalgorithm (GA), genetic programming (GP),9 and other evo-lutionary learning methods. The EA that evolves in a mannerresembling natural selection can be employed to solve com-plex problems even if its creator does not fully understandthe phenomenon.10 However, the evaluation of a real-timesystem using the EA approach very often encounters a criti-cal difficulty due to a significant amount of computation timecoming from repetitive computations for many individualsover several generations. The high computational cost of theEA is a serious limiting factor. Another limitation of mostevolutionary algorithms is their ambiguity in discriminatinggenotype and phenotype. In nature, the fertilized ovum cell

Paper 12404 received Nov. 13, 2012; revised manuscript received Jan. 29,2013; accepted for publication Feb. 1, 2013; published online Mar. 1, 2013.

Journal of Electronic Imaging 013031-1 Jan–Mar 2013/Vol. 22(1)

Journal of Electronic Imaging 22(1), 013031 (Jan–Mar 2013)


http://dx.doi.org/10.1117/1.JEI.22.1.013031






undergoes a complex process known as embryogenesis tobecome a mature phenotype.11 The outward, physical mani-festation of an organism is anything that is part of an observ-able structure, function, or behavior of a living organism.The observable characteristics of an organism internallycoded, inheritable information is carried by the genetic code,the inherited instruction which carries acquired characters.An organism is usually exposed to sequences of events andreinforcements, and the same genotype encounters diverseenvironments where different behavioral actions are optimal.Learning is required in the whole lifetime of an individualthat enforces a selection pressure, even though a correctphenotype from birth seems to be produced.12 Some success-ful evolutionary approaches combined with learning technol-ogies can be found in robot controlling areas.13

Most parameter control approaches require a precisemodel that describes interactions with environments. Inmany cases, however, it is very complicated or impossibleto model the interaction for precise real-time task processing.In a real world situation, offline learning with an imprecisemodel frequently cannot produce an acceptable perfor-mance.14 The reinforcement learning (RL) approach is oneof the promising approaches to solve such problems, sinceit is based on “learning by experience.” The RL approachis very effective in controlling a system by interacting withits external environment.15 A bunch of model-free value-function-based RL algorithms have been studied, such as theQ-learning, SARSA, and AC methods.12,16 The RL approachcan optimize system parameters through interactions with aperceived environment.17,18 An RL-based eye-trackingscheme can learn state-action probabilities, provide perfor-mance-adaptive functionality without requiring a systemmodel, and assure the convergence to an optimal action.However, a very large amount of repetitive trial and error isnecessary in order to find the optimal performance, becauseof the curse of a huge search space in the visual trackingproblem. The enormous amount of learning time requiredfor a low-cost eye tracking task in varying environments pro-hibits the RL algorithm to be employed, since a huge searchspace for deciding actions requires a high computation timefor learning by experience.

We propose an evolutionary adaptive framework for ahigh-performance eye tracking scheme for a low-cost vision-based HCI. The framework for efficient and robust eyetracking behaves adaptively by combining the evolutionaryand interactive learning paradigms. It devises a performance-adaptive eye tracking scheme where possible configurationsof eye tracking are represented as genotype and phenotypecontrol spaces in terms of algorithm structure and thresholds/parameters. The EA explores to find an optimal eye trackinggenetic code in the genotype control space where an illumi-nation environment measured by image quality analysis andrepresented by image context.19,20 The EA can evolve rela-tively long-term behaviors than the reactive behaviors of theRL,14 while the RL provides quick interactive adaptation ina real world environment. The EA performs mainly in thegenotype learning using a pre-collected training set in an off-line fashion. The huge control space of algorithm structuresand thresholds/parameters is partitioned in accordance witha perceived image context, and the EA determines a geneticcode that fits for the perceived image context. The RL’s mainrole is to organize the phenotype of the scheme interactively.

The RL algorithm defines its internal environmental statefrom the partitioned genotype control space and mainly per-forms a quick and interactive adaptation. In this way, theproposed method perceives the changing environment ofthe real world and learns an effective algorithm structure andassociated thresholds/parameters for optimal eye trackingperformance. The major contributions of this paper aresummarized as follows:

1. Instead of posing a high-cost image capturing deviceor very limited situation, a performance guarantee isachieved by formulating eye tracking into the dynamiccontrol problem of optimizing an algorithm structurewith associated thresholds/parameters.

2. Tackling the dilemma of the EA and the RL algo-rithms, i.e., the intrinsic non-real-time property of theEA and the curse of huge search space of the RLalgorithm, the framework can efficiently improve theperformance of the eye tracking scheme in terms ofaccuracy and speed. The EA approach can reduce thecurse of high dimensionality and the huge numberof trials of the RL algorithm, and accelerate the con-vergence speed of the RL algorithm compared withlearning from scratch.

3. The proposed framework defines the RL internal envi-ronment in connection with the genotype control spaceinstead of an input image as the RL environment inRefs. 8 and 21, which cannot fully utilize the interac-tive characteristics of the RL algorithm. While theirapproach can be thought to take an image sequenceas stimuli, our approach associates the RL environ-ment state with thresholds/parameters of the eyetracking scheme. As a result, the proposed frameworkcan explore the consecutive and interactive threshold/parameter space influenced by a changing externalenvironment, and thus fully take advantage of theRL algorithm by providing very flexible and highperformance in the eye tracking scheme.

4. The framework can be readily applied to other vision-based HCI applications in solving their intrinsic brit-tleness under changing image capturing environments.

This paper is organized as follows. In Sec. 2, the evolu-tionary adaptive framework is discussed for controlling eyetracking parameter spaces. The genotype control space andevolution are discussed in Sec. 3. The phenotype manifesta-tion using the RL algorithm is given in Sec. 4. In Sec. 5, theperformance measures of eye tracking are discussed. Finally,the experimental results and concluding remarks are givenin Secs. 6 and 7, respectively.

2 Evolutionary Adaptive Framework for EyeTracking Parameter Control

The rationale behind the proposed eye tracking scheme withevolutionary adaptive capability is the recurrence phenome-non of external stimulus, here image quality variation influ-enced by the changes of the image-capturing environmentmainly from illumination, viewing angle, and individual eyeshape. The proposed scheme explores an optimal algorithmstructure and associated threshold/parameter space guidedby the observation of image variations that mainly affectsscheme performance. The proposed scheme consists of three


Shen et al: Evolutionary adaptive eye tracking for low-cost human computer interaction applications


modules, as shown in Fig. 1: the EA module, the eye-track-ing module, and the RL module. It continuously perceivesand interacts with the external environment, and adapts itsalgorithm structure, thresholds, and parameters. The pro-posed framework includes genotype evolution and pheno-type adaptation. The first one performs long-term learningbased on the EA, and the second one performs interactivephenotype learning using the RL algorithm.

The performance of an eye-tracking scheme is highly de-pendent upon the choice of algorithms in each step and theirassociated parameters and thresholds.20 It is best formulatedinto a nontrivial dynamic control problem of deciding on aneye-tracking algorithm structure and associated thresholdsand parameters. Considering the primary cause of degradingthe eye-tracking performance is image quality variation dueto illumination, pose, and eye shape, we introduce two levelintelligent control mechanisms to optimize the eye-trackingscheme. The first level control mechanism employs EA forthe genotype evolution, which determines a possible optimalscheme configuration for a perceived external environmentas an image context, where an external environment mea-sured by k-means clustering and image quality analysis.19,20

The second one takes advantage of the RL algorithm for aphenotype manifestation of the genetic code.

The eye area is located using the face location and eyearea location methods in Ref. 20, and eye tracking is proc-essed based on the eye tracking method in Ref. 22. The initialeye centroid is estimated by preprocessing, feature extrac-tion, and a partial Hough transformation. Preprocessing iscarried out using an adaptively selected algorithm substruc-ture and its associated thresholds, and feature extraction isperformed to produce a contour or edge image. The contour

or the edge image is processed by the partial Houghtransform with adjusted arc angle parameters of the irisboundary to determine an optimal tracking point. Finally,the Kalman filter is applied, and the next eye centroid is pre-dicted. The above steps are repeated until eye tracking is ter-minated. The eye-tracking control space, i.e., the possiblealgorithm structures and the ranges of thresholds/parameters,is determined based on prior knowledge or experimentsconsidering the tradeoff between tracking accuracy and exe-cution time constraints. The estimation steps of the eye cent-roid, the eye centroid prediction by the Kalman filter,23,24 andtheir control space are discussed in the remainder of thissection.

2.1 Algorithm Structure and Thresholds/Parametersin Preprocessing and Feature Extraction

The preprocessing step considered here consists of histogramequalization, Retnix with one threshold,25 and end-in con-trast stretching with two parameters.26 For example, possiblealgorithm structures of the preprocessing step are histogramequalization, Retnix, Retnix with histogram equalization ina serial combination, or the end-in contrast stretching. Thefeature extraction step can be binarization, the Canny edgedetection algorithm,27 binarization and the Canny in parallelcombination with an AND operation, or binarization andthe Canny in parallel combination with an OR operation,where binarization and the contour in serial combinationsis denoted by binarization for simplicity. Feature extractionis selectively carried out using one of the above four algo-rithm structures with different thresholds. Binarization hasone threshold, and the Canny algorithm has two thresholdsfor edge detection, where the first is related to edge linking,and the second is related to finding the initial segments of thestrong edges.

2.2 Parameters of the Partial Hough TransformIn our eye tracking system, the iris boundary is approximatedby a circle since the proposed method aims at real-time per-formance using low resolution images. The algorithm detectsthe center of the eye (iris) by calculating the outer boundaryof the iris image. We adopt a modified Hough transformmethod, called partial Hough transform, to extract featuresthat are less affected by noise and eyelid occlusions. Twoportions of the circle (partial circles) are matched instead ofthe entire circle to minimize the occlusion effects of theupper and lower eyelids. A four-dimensional control spacefor eye centroid tracking includes the angle values indicatingthe partial iris boundaries of interest, i.e., the partial circles(see Fig. 2). The partial Hough transform algorithm for circle

Fig. 1 The block diagram of the evolutionary adaptive eye-trackingframework.

Fig. 2 Two partial circles of iris outer boundaries with different eye opening degrees: the portions of the outer iris boundary between the anglesof ϕ1 and ϕ2, and that of ϕ3 and ϕ4 are considered instead of all iris boundaries avoiding eyelid occlusion effects.




fitting can then be described as follows. The iris outer circlecan be represented as

ðx − a0Þ2 þ ðy − b0Þ2 ¼ r2; (1)

where ða0; b0Þ is the coordinates of the eye centroid (thecenter of outer iris boundary), and r is the circle radius.

The two stage algorithm formulated in Ref. 23 isemployed: the first stage finds the eye circle center, and thesecond stage estimates normal directions to the outer irisboundary points on the partial circles intersecting the eyecentroid. Let ðx; yÞ be the gradient direction point on theouter iris boundary. A mapping from ðx; y;ϕÞ triples into thecenter parameters ða; bÞ produces a straight line, where ϕ isthe angle of the gradient direction. The intersection of manyof these lines identifies the coordinates of the eye centroid.The relation between ðx; y;ϕÞ and ða; bÞ is given as

ϕ ¼ arctan

�y − bx − a

�: (2)

In the partial Hough transform algorithm, the range of theouter iris boundary is restricted by four angle parameters, ϕ1,ϕ2, ϕ3, and ϕ4, to avoid the effect of eyelid occlusion, i.e.,

ϕ ∈ ½ϕ1;ϕ2� ∪ ½ϕ3;ϕ4�: (3)

2.3 Noise Parameters of the Kalman FilterThe Kalman filter28 is a recursive approach for the discretelinear filtering problem by estimating a state process thatminimizes the squared error.29,30 In this paper, the Kalmanfilter is employed to approximate the tracking of eye cent-roids formulated as a Bayesian tracking problem.

Assuming that the motion of the eye centroid has a con-stant velocity Ft, the covariance matrices of the state transi-tion model and the process noise model can be determined asfollows:29

Ft ¼

26641 0 ΔT 0

0 1 0 ΔT0 0 1 0

0 0 0 1

3775 (4)

and

Qt−1 ¼

26640 0 0 0

0 0 0 0

0 0 q2 0

0 0 0 q2

3775; (5)

where ΔT is the time interval between two adjacent framesthat is usually very short, and q is a parameter related to theprocess nose. Assuming that Ht is constant, the matrices ofthe measurement model and the measurement noise covari-ance are determined, respectively, as follows:

Ht ¼�1 0 0 0

0 1 0 0

�(6)

and

Rt ¼�r2 0

0 r2

�(7)

where r is a parameter related to the measurement noise. TheKalman filter approach often encounters difficulty in estimat-ing the noise covariance matrices Qt−1 and Rt. In manyapplications, the noise covariance’s Qt−1 and Rt will be sta-bilized quickly and remain constant.31 The parameters rand q can be pre-computed by running the filter offline ordetermining the state-state values.32 However, the noise doesnot remain constant in eye tracking due to the uncertaintiesof dynamically changing environments. In this context, theparameters r and q are evolved by the EA as discussed in thissession and adapted by the RL method in the next section inorder to achieve optimal performance in our eye-trackingscheme.

3 Genotype EvolutionIn this section, we discuss how to encode the variation of thealgorithm structure and associated thresholds and parametersof the eye-tracking scheme into a genetic code so that thegenotype control space is explored for evolutionary optimi-zation. The genotype of the tracking scheme is denoted bygenetic codes in the genotype control space, each of whichrepresents an algorithm structure and thresholds/parameters,and thus an optimal algorithm configuration that has evolvedfor a given external environment is denoted by a geneticcode. The algorithm structure of the tracking scheme isdivided into the preprocessing, feature extraction, partialHough transform, and Kalman prediction as discussed pre-viously. Given an external environment, an optimal algo-rithm structure with associated thresholds/parameters isdetermined in accordance with a perceived image context.

In general, a context can be any information that affectsthe performance of a system of interest. Image contextmight be affected by lighting direction, brightness, contrast,and spectral composition. The concept of an image contextwith image quality analysis technology can improve systemperformance.33 Recently, image quality analysis methodshave been successfully applied to diverse applications suchas image storage, compression, communication, display, seg-mentation, and recognition,34 as well as used to decide selec-tive preprocessing for an adaptive illumination normalizationusing adaptive weighting parameters.19,20 In this paper, animage quality analysis is adopted for the categorization ofthe variation of input image quality that is influenced by theexternal environment. For simplicity of discussion we em-ploy only input image illumination changes as a clue toperceive the changes of an external environment. The Haarwavelet-based Bayesian classification is performed for faceand eye area detection20 before eye tracking begins.

Each substructure is constructed by combining actionunits. An algorithm structure can be divided into the sub-structure sequence, and each substructure is configured bythe combination of action units which are the basic buildingblocks of the system. Let Z be the set of thresholds and/orparameters of an action unit (AU). Each action unit has asso-ciated thresholds and/or parameters if they are required, andare denoted as follows:

AUkðZkÞ ¼ AUkðθk1 ; : : : ; θknÞ; (8)

where θki is the i’th threshold/parameter of action unit AUk.Let ΨA be a configurable algorithm substructure, say A. It isdenoted as




ΨA ¼ ΨA½AUA1 ðZA

1 Þ; : : : ;AUAa ðZA

a Þ�: (9)

It means that ΨA is configured from action units AUA1 ; : : : ;

AUAa . In the eye-tracking scheme, the whole algorithm struc-

ture is divided into four consecutive substructures, i.e., thepreprocessing, feature extraction, partial Hough transform,and Kalman filter prediction. The preprocessing has threeaction units histogram equalization (HE), Retnix (RX), andend-in contrast stretching (ECS). Feature extraction is repre-sented by action units binarization (BN) and Canny algo-rithm (CN). Finally, eye tracking is performed by action unitspartial Hough transform (PHT) and Kalman filter (KF).Then, the whole algorithm structure of our eye-trackingscheme is described as follows:

Ψ ¼ ΨPre ≫ ΨFE ≫ ΨPHT ≫ ΨKF

¼ ΨPre½HE;RXðθRXÞ;ECSðθECS1;ECS2Þ�≫ ΨFE½BNðθBNÞ;CNðθCN1; θCN2Þ�≫ ΨPHT½PHTðθPHT1; θPHT2; θPHT3; θPHT4Þ�≫ ΨKF½KFðθKF1; θKF2Þ�; (10)

where ΨPre, ΨFE, ΨPHT, and ΨKF denote the algorithm sub-structures of the preprocessing, feature extraction, partialHough transform, and Kalman filter prediction, respectively,and θRX denotes the scale parameter of action unit RX, andso on. Let image context be denoted as Lt at time-step t, thenthe eye tracking algorithm configuration is illustrated asfollows:

ΨðLtÞ¼ΨPreðLtÞ≫ΨFeðLtÞ≫ΨPHTðLtÞ≫ΨKFðLtÞ¼ΨPre½HE≫RXðθRX¼θ1Þ�≫ΨFEj½ðCNðθCN1¼θ3;θCN2¼θ4�kANDBNðθBN¼θ2Þ�≫ΨPHT½PHTðθPHT1¼θ5;θPHT2¼θ6;θPHT3¼θ7;

θPHT4¼θ8Þ�≫ΨKF½KFðθKF1¼θ9;θKF2¼θ10Þ�; (11)

where the preprocessing of the configured eye tracking algo-rithm consists of the HE followed by the RX with thresholdθ1; feature extraction consists of a parallel combination usingthe AND operation between CN with thresholds θ3, θ4, andBN with threshold θ2, and so on. Genotype encoding formatof the eye tracking system in the genotype control space is

illustrated Table 1. We assign the preprocessing algorithmsubstructure 2 bits since the feasible combinations are fourtypes, i.e., HE, RX, ECS, and HE followed by RX (HE ≫RX). The feature extraction algorithm substructure isassigned 2 bits where the feasible combinations are BN, CN,the parallel combination of BN and CN using the AND oper-ation, and the parallel combination of BN and CN using theOR operation. Since the partial Hough transform and theKalman prediction have only one algorithm structure, no bitis assigned as the algorithm structure parameter.

If the feature extraction step is determined to have fourtypes of algorithm structure, it is denoted by its 2-bit vectorof the genetic code as shown in Table 2.

Let θA be a threshold (parameter) of an action unit A. Thepivot θpivotA is defined as the central value of a threshold rangeof θA in the phenotype control space. Let θA min and θA max

denote the minimum and the maximum threshold values ofθA, respectively. The interval of the threshold pivots betweenadjacent bit vector, θAα is calculated by

θAα ¼θA max − θA min

2β − 1; (12)

where β is the number of bits representing θA (see Table 1).The pivots of the action unit A are derived by adding θAαstarting from θA min up to θA max. For example, if θBN min

and θBN max are assumed to be 68 and 188 in gray level,respectively, the lookup table for the binarization thresholdpivots of a sub-genetic code of θBN are illustrated in Table 3.

Genotype evolution is performed to find an optimalgenetic code for each image context. The fitness is calculatedby the average tracking rate [Eq. (30)] discussed in Sec. 5.

4 Phenotype Manifestation Using the RL AlgorithmAs discussed in the previous section, the EA evolves thesystem configuration in the dynamic control space of the

Table 1 An illustration of the genotype encoding format of the proposed eye tracking scheme and the parameter ranges.

Ψ∕θ

Algorithm substructure

Preprocessing Feature extraction Partial Hough transform Kalman

ΨPre θRX θECS1 θECS2 ΨFE θBN θCN1 θCN2 θPHT1 θPHT2 θPHT3 θPHT4 θKF1 θKF2

No. of bits 2 4 4 4 2 4 4 4 4 4 4 4 4 4

Rmin 181 32 160 68 4 32 148 230 278 0 0.005 0.05

Rmax 245 96 224 188 20 64 180 262 310 32 0.015 0.15

Note: Rmin and Rmax indicate the minimum and maximum of the range values of each parameter, respectively.

Table 2 Four types of algorithm substructure denoted by its 2-bitvector of the genetic code in the feature extraction step.

Bit vector of ΨFE 00 01 10 11

Feature extractionalgorithmsubstructure

Binarization Canny BinarizationjjAND Canny

BinarizationjjOR Canny




algorithm structure and its associated thresholds/parametersin terms of genotype representation so that they are puttogether into the eye tracking scheme to optimize perfor-mance. The role of the EA is to find an appropriate schemegenotype, i.e., a genetic code representing an optimal eyetracking configuration based on a perceived external envi-ronment, (i.e., a perceived image context here) and the fitnessevaluation. Since real world external environment changesare not fully predictable in the EA evolution step, the RLalgorithm is employed to seek a precise phenotype manifes-tation of eye tracking in terms of thresholds/parameters andthe reward. Combining the EA evolution and the RL adap-tation capabilities, not only can the curse of huge controlspace of possible eye tracking configurations in applyingthe RL algorithm be handled, but also the difficulty of theEA in a real-time adaptation.

In our approach, the dynamic control space of the EA isnot the same as that of the RL algorithm, but they are inter-related through the performance optimization in a real worldenvironment. A genetic code, generated by the EA for eachimage context, defines a related phenotype control spacewhich describes by the ranges of thresholds and/or parame-ters. For simplicity and real-time consideration, the algo-rithm structure in a genetic code will not be changed in thephenotype manifestation, and it is also intuitively suitablewith a real world evolutionary adaptation phenomenon.The RL algorithm treats the adaptation of the thresholds/parameters as a consecutive decision process, i.e., the RLstate transition, to obtain an optimal configuration of eyetracking. Regarding the eye-tracking optimization as theevents of consecutive trials for deciding the RL state ob-served in discrete time, a Markov decision process (MDP)model which is necessary to use the RL algorithm need tobe justified. In theory, the RL algorithm cannot be appliedexactly if the Markov property is not satisfied. However, theRL application developed based on the Markov assumptionis still valid in many cases, to approximate and understandsystem behavior. Even when a state signal does not satisfythe Markov property, the RL algorithm can be employed in

more complex and realistic non-Markov cases, and the RLalgorithm has been successfully applied in many cases byapproximating the state as a Markov state.16

In general, an MDP approach for a RL algorithm requiresfive-tuples: states, actions, transition function, rewards anddiscount factor.35 The action can be any decision/behaviorthat the RL agent needs learn, and a state can be any factorthat influences the agent’s decision making. In this paper,each phenotype manifestation, denoted by the phenotypecontrol space of associated thresholds/parameters defines astate in the RL internal environment (see Fig. 1), and inter-acts with the RL agent to explore an optimal threshold/parameter set maximizing the eye-tracking performance.Here, the action is defined as the move to a next state thatdecides next phenotype manifestation in the phenotypecontrol space.

The produced action activates a state transition of theinternal environment from the current internal state in thephenotype control space into a new state, and the RL agentreceives a reinforcement signal from the internal environ-ment. One can notice that the proposed approach is distin-guished from Refs. 8 and 21 where an image itself ismodeled as an RL state, and the parameter decision is as anaction. Their approach can be thought to take an imagesequence as stimuli and to produce a segmented image.They cannot fully utilize the interactive characteristics ofthe RL algorithm in full, since the agent can alter the stateat each time-step by taking actions.36 On the other hand, ourapproach relies on consecutive and interactive threshold/parameter transitions influenced by changing external envi-ronments (see Fig. 1), which are modeled as a Markov deci-sion process. In this paper, we will stay with the conceptsof environment, agent, reward, and action in a usual RL.However, we use the terms internal environment and inter-nal reward to avoid a possible confusion between externalenvironment and external feedback. On the other hand,the internal environment can be interpreted as an intrinsicmotivation used in cognitive science, where intrinsicallymotivated behavior is emphasized as a vital factor for intel-lectual growth.37 RL action can also be interpreted as internaldecision instead of an action as a human’s decision to movehis muscles in a certain direction.

It can be interpreted that the next values of thresholds/parameters are mainly decided by the current internal state,i.e., the current values of thresholds/parameters of thetracking scheme. The RL internal states are members of afinite set denoting all possible eye tracing scheme configu-rations in terms of associated thresholds/parameters in thephenotype control space, which is related and limited bya genetic code influenced by the external environment (seeFig. 1). Thus, the RL internal state transits randomly fromone state to another within the limited control space indiscrete time-steps.

In the RL algorithm, an episode is defined to separate theagent-environment interaction into subsequences. An epi-sode can be thought as a temporal unit of repeated inter-actions, and an episode has a standard starting state and aterminal state.16 The RL starting state is declared eitherwhen the eye-tracking confidence is degraded below a pre-defined criterion, say the RL starting threshold η, or an alertsignal is received from the external environment throughthe external feedback (see Fig. 1). The RL terminal state is

Table 3 An illustrative lookup table of the binarization thresholdpivots of individual sub-genetic code of θBN.

K(subgeneticcode of θBN)

θpivotBN (K )binarization

pivot

K(subgeneticcode of θBN)

θpivotBN (K )binarization

pivot

0000 68 1000 132

0001 76 1001 140

0010 84 1010 148

0011 92 1011 156

0100 100 1100 164

0101 108 1101 172

0110 116 1110 180

0111 124 1111 188




declared when the eye-tracking confidence reaches a pre-defined criterion, say RL terminal threshold ζ, or it doesnot improve any more. The current internal state is the esti-mation of the time-discounted amount of the reward to beexpected starting from that internal state and continuing toact according to its policy.

Considering real world constraints on computationresources, the set of internal states in the phenotype controlspace limited by a genetic code determined by a perceivedimage context

S ¼ fs1; : : : ; sk; : : : ; sKg; (13)

where sk is a state which indicates the indexes of the thresh-old/parameter lookup table in the eye tracking scheme. Inother words, all possible phenotype manifestations of theeye tracking scheme are indexed by internal environmentstates. Let the dimension of the phenotype control space bep. Then, an internal environmental state at time-step t isdenoted as

st ¼ ½θt;1; θt;2; : : : ; θt;p�; (14)

where θt;i is an index of the lookup table representing anassociated threshold/parameter at time-step t (e.g., Table 4).

A finite set of actions is available in each internal state,and αt ∈ AðstÞ indicates the decision action of changing thethresholds/parameters of the eye tracking scheme at time-step t. Let at ¼ ½ζt;1; ζt;2; : : : ; ζt;p� be an action at time-step t, where ζt;i is −dt;i, 0, or þdt;i meaning that the neg-ative direction movement of d units, stationary, or positivedirection movement of d units in the i’th coordinate (index)of the phenotype control space at time-step t.

Given threshold/parameter values, the tracking schemewill generate possible eye centroids within a search window.Let st ¼ ½θt;1; θt;2; : : : ; θt;p� be a phenotype control vector attime-step t, where the dimension of the threshold/parameterspace is p and θt;i is an RL index of the lookup table rep-resenting a threshold or a parameter. The index has the dis-crete range precision. For example, θBN has the discreterange precision “8” in Table 4, meaning that the indexvalue of θBN is one of the values in f0; 1; : : : ; 7g. Table 4illustrates the state vector of the proposed eye trackingscheme with its discrete range precision.

Recall that the pivot θpivotA is defined as the central value ofa threshold range of θA in the phenotype control space. Thepossible thresholds (parameters) of a RL state are decided inthe RL lookup table as illustrated in Table 5 for the binariza-tion threshold. Let χ be the interval between adjacent RLthresholds (parameters) for θpivotA . The RL lookup table isgenerated using the following equation:

RLði; θpivotA Þ ¼�θpivotA ðKÞ −

�χ

2

��

þ�i −

�vðθAÞ2

− 1

��× χ; (15)

where i is the RL index of threshold (parameter) θpivotA , K is asub-genetic code, and vðθAÞ is the discrete range precision ofθA (see Table 4). As an example, if the binarization sub-genetic code is decided as “0110,” then the binarizationthreshold pivot of the genetic code is “116” in gray levelintensity as shown in Table 3. Table 5 shows the RL lookuptable of the binarization discrete range in the phenotype con-trol space and dedicated RL thresholds which are denotedby intensity values. For example, the RL threshold wheni ¼ 0 and χ ¼ 2 is calculated as follows: RLð0; θBNÞ ¼f116 − 1g þ f0 − ð8∕2 − 1Þg × 2 ¼ 109. If the RL statehas a binarization RL index of “4,” binarization is performedusing the gray value 117 as the threshold in Table 5.

The internal reward function which is indirectly influ-enced by the external feedback defines the goal in our rein-forcement learning (see Fig. 1). It maps each internalenvironment state and action pair to an expected internalreward. The reinforcement learning specifies how the agentalters its threshold and/or parameter decision policy as aresult of its experience in eye tracking. The goal is to maxi-mize the total amount of expected internal rewards over thelong run. In this context, the immediate reward of the pro-posed RL algorithm is defined as follows: if the eye trackingis successful using Eq. (29) for each image frame, the highscore is returned as the internal reward. Otherwise, the lowscore is returned as the reward where the high score and thelow score are determined experimentally.

The aim of the RL algorithm is to optimize its long-termperformance using the feedback of one-step immediate per-formance of eye tracking. We choose a temporal difference

Table 4 An example of the RL internal state format each element of which is a threshold/parameter within its discrete range precision.

Substructure Preprocessing Feature extraction Partial Hough transform Kalman

Threshold/parameter θRX θECS1 θECS2 θBN θCN1 θCN2 θPHT1 θPHT2 θPHT3 θPHT4 θKF1 θKF2

Discrete range precision 8 8 8 8 8 8 4 4 4 4 16 16

Table 5 An illustration lookup table for the phenotype discrete range of the binarization bit vector of genetic code “0110” and its dedicatedthresholds in intensity values when the genotype binarization threshold is determined as “116.”

i (RL index) 0 1 2 3 4 5 6 7

RLði ; θBNÞ 109 111 113 115 117 119 121 123




(TD) learning approach since it can provide an online, fullyincremental learning capability, and since the TD approachlearns from each transition regardless of what subsequentactions are taken.16 The TD method also learns directly byexperience without any knowledge of environment dynamicssuch as the model of its reward and next-state probabilitydistributions. Two popular TD algorithms are SARSA andQ-learning, and they are different only in the estimation pol-icy, i.e., on-policy and off-policy. Considering the real-timerequirement of eye tracking, Watkins’s one-step Q-learn-ing12 is selected which is relatively faster to converge thanSARSA.16 The one-step Q-learning algorithm estimates Q�which makes the action-value functions, called Q-functions,and is an important algorithm of reinforcement learning withits simplicity, effectiveness, and model-free characteristic.38

The one-step Q-learning method is a common off-policytemporal difference control algorithm. Q-learning accumu-lates optimal actions by evaluating action–value Qðs; aÞ toconstruct sate-action table. The action-value updating equa-tion of the one-step Q-learning is given as follows:12

Qtþ1ðs; aÞ ¼ Qtðs; aÞ þ α½rtþ1 þ γmaxa 0

Qtðs 0; a 0Þ−Qtðs; aÞ�; (16)

where α ∈ ð0; 1� is a learning rate and γ ∈ ð0; 1� is a dis-counting rate. Action-value function Q learns and approxi-mates an optimal action-value function, say Q�, independentof the policy which enables early convergence with dramaticanalysis simplification.16,36

The policy is the rules for selecting actions given the val-ues of the possible next states, and the greedy policy is deter-ministic and takes the action with the maximum Q-value forevery state, i.e.,

πðsÞ ¼ arg maxa Qðs; aÞ; (17)

where π∶ S → A is deterministic. Equation (16) is validonly when we can find a stable reward for individual action-state pairs in our eye tracking scheme. However, the rewardcan only be consistent when the image quality of consecu-tive frames is near homogeneous. The assumption of nearhomogenous is not valid if the prior knowledge of imagecontext from the EA module is disturbed due to a new exter-nal environment occurrence. To absorb the temporal uncer-tainty of an external environment, we employ a cooperativemulti-agent RL36 as follows: the joint action set A ¼ A1 ×· · · × Ai × · · · × An, where Ai is the set of actions of thej’th agent, the state transition probability function f∶ S ×A → S, and the reward function ri∶ S × A × S → R. TheQ-learning equation and the greedy policy at time-step t aredefined as follows:

Qtþ1ðs; aÞ ¼ Qtðs; aÞ þ α½rtþ1 þ γmaxa 0

Qtðs 0; a 0Þ−Qtðs; aÞ � (18)

πtðsÞ ¼ arg maxai

maxa1; · · · ;ai−1;aiþ1; · · · ;an

Qðs; aÞ; (19)

where s ∈ S is a current state, action group a ¼fa1; · · · ; ang with ai ∈ Ai current action of the j’th agent,s 0 ∈ S a next state, and a 0 a next action group. Each

agent is an independent decision maker. The greedy doesnot explore the consequences of untried action sequencessince it does not allow learning a current low-value action,but one that might lead to high value in the future. In thiscontext, the ε-greedy policy is used to balance the actionsbetween exploration and exploitation.16 Our multi-agent Qlearning algorithm is given in Algorithm 1.

The whole evolutionary adaptive process for low-cost eyetracking is given in Algorithm 2. The framework learns,through repeated interactions with each other, to autono-mously optimize eye-tracking performance at run-time.The limited system memory and the real-time delay con-straints of the eye-tracking complementary accelerate thelearning algorithm that exploits the perceived environmentalknowledge about a system’s external dynamics.

Algorithm 1 Multi-agent Q-learning for eye tracking.

Repeat

Step 1. Choose a from s using ε-greedy policy.

Step 2. Take the actions, observe an immediate reward r bycalculating the internal reward (see Algorithm 2), andobserve the new internal state s 0.

Step 3. Calculate the following:Qtþ1ðs; aÞ ¼ Qt ðs; aÞ þ α½r tþ1 þ γmaxa 0 Qt ðs 0; a 0Þ−Qt ðs; aÞ�

Step 4. s←s 0

Until the step limitation per frameM is reached or the success trackingcriteria is satisfied [see Eq. (29)].

Algorithm 2 Evolutionary adaptive eye tracking.

Repeat

Step 1. Perceive the image context.

Step 2. Find an optimal genetic code for the perceived imagecontext, decide the phenotype control space, and initialize allQðs; aÞ values arbitrarily.

Step 3. Repeat for consecutive image frames

Step 3.1. Perform Q-learning by calling Algorithm 1and decidethe optimal thresholds/parameters.

Step 3.2. Perform the eye tracking until τ > ζ.

Step 4. If τ > η, go to Step 3.

Step 5. If τ ≤ δ (i.e., the eye tracking confidence τ is below the EArestarting threshold frequently where δð< ηÞ, go to Step 1.

Until termination.




5 Performance Measures of Eye Tracking fromImage Sequences

This section discusses the offline measures of eye-trackingperformance such as recall, precision, harmonic mean, andthe real-time measure, called real-time tracking confidence τ.Given N image frame sequence in a video be denoted asV ¼ ðI1; I2; : : : ; INÞ, the sequence of ground truth iris areawill be compared with that of tracked iris areas for offlinemeasures. Let X ¼ ðx1; x2; : : : ; xNÞ and Y ¼ ðy1; y2; : : : ; yNÞbe the sequence of ground truth iris areas and that of trackediris areas, respectively. Owing to the evaluation method frominformation retrieval research, tracking performance can bemeasured by recall and precision.39 The overlapped area di-vided by the union of the ground truth and tracked (detected)iris area is used for measuring a successful tracking in animage frame Ii.

40 The inner circle area of the iris is excludedto prohibit the effect of possible light source reflection asshown in Fig. 3.

ϑi ¼Areaðxi ∩ yiÞAreaðxi ∪ yiÞ

: (20)

The constraint for successful tracking is defined as

ϑi > tm; (21)

where tm is a matching threshold and determined experimen-tally. In a general object tracking,40 a tracking is consideredto be correct if its overlap with ground truth bounding box islarger than 50%, i.e., tm ¼ 0.5. In this paper, tm is decided as0.6 since we require a more strict constraint for measuringthe successful eye tracking.

Regarding the situations where a ground truth iris areacannot be defined due to heavy noise, and a tracked iris areacannot be found due to a shortage of the tracking algorithm,we define k · k operations as follows:

kxkk ¼�1 if xk exists in frame Ik0 if xk does not exist in frame Ik

(22)

kykk ¼�1 if yk existsðcan be detectedÞ in frame Ik0 if yk does not exist in frame Ik

(23)

Then, the recall and precision measures of the tracked eyeimage sequence are defined as

REðX; Y; tmÞ ¼P

Nk¼1 fðxk; yk;tmÞP

Nk¼1 kxkk

(24)

PRðX; Y; tmÞ ¼PN

k¼1 fðxk; yk;tmÞPNk¼1 kykk

; (25)

where f is the matching function, taking into account asuccessful track, i.e.,

fðxk; yk;tmÞ

¼�1 if xk ∩ yk matches to xk ∪ yk; i:e:; ϑk > tm0 if the overlapped area ðxk ∩ ykÞ does not match

(26)

The harmonic mean (HM) of recall and precision whichemphasizes the minimum of the two performance valuesis defined as follows:

HM ¼ 2REðX; Y; tmÞ × PRðX; Y; tmÞREðX; Y; tmÞ þ PRðX; Y; tmÞ

: (27)

Since it is unrealistic to mark all ground truth iris areas in areal-time application, the real-time tracking confidence isdefined and measured as follows. Let x 0

i be the boundaryarea of a partial iris decided by the Hough transform andpartial circle ranges as discussed in Sec. 2 and y 0

i be theactual number of boundary iris pixels detected in the area.Given V ¼ ðI1; I2; : : : ; INÞ, let X 0 ¼ ðx 0

1; x02; : : : ; x

0NÞ and

Y 0 ¼ ðy 01; y

02; : : : ; y

0NÞ. The real-time tracking mean (RM)

at the time-step k ¼ q (i.e., at image frame Iq) is definedas the mean of successfully tracked adjacent frames betweenthe time-steps k ¼ q − δ delta and k ¼ qþ δ, where δ is aparameter used for control the smoothness of the RM.

Fig. 3 Ground truth and overlapped iris areas for measuring a suc-cessful tracking, where the inner circle area of the iris is excludedto prohibit the effect of possible light source reflection in the centerof iris: (a) the ground truth model of the iris, and (b) the overlappediris area definition as the union of the ground truth and the detected irisareas.

Fig. 4 Some examples of image frames used in analyzing the effectsof image quality variance of the tracking system: (a) images with goodquality illumination, (b) images with moderate quality illumination, and(c) images with bad quality illumination, where yellow circles indicatethe successes in eye tracking and red circles indicate the failures.




The effect of different δ is discussed in Sec. 6.2. The RM isthen calculated using the frame sequence Iq−δ; Iq−δþ1; : : : ;Iqþδ−1; Iqþδ as follows:

RMðX 0; Y 0; Iq; t 0mÞ ¼Pk¼qþδ

k¼q−δ gðx 0k; y

0k; t

0mÞPk¼qþδ

k¼q−δ kx 0kk

; (28)

where the success of eye tracking at image frame Iq isdefined using the constraint of the real-time matching thresh-old t0m as follows:

gðx 0; y 0; t 0mÞ ¼�1 if

y 0kx 0k≥ t 0m

0 otherwise; (29)

where t 0m is determined experimentally as 0.625 based onthe threshold determination method41 and the details can befound in Sec. 6.2.

Similarly, the average tracking rate (AR) for the wholevideo sequence V ¼ ðI1; I2; : : : ; INÞ is defined as follows:

ARðX 0; Y 0; Iq; t 0mÞ ¼P

k¼Nk¼1 gðx 0

k; y0k; t

0mÞPk¼qþδ

k¼q−δ kx 0kk

: (30)

However, Eq. (28) still cannot be used for real-time trackingperformance estimation since at the time-step of image framek ¼ q, we cannot predict the tracking successes in imageframes between k ¼ qþ 1; : : : ; k ¼ δ in prior. Thus, thereal-time tracking confidence τ at the time-step of imageframe Iq is estimated as the approximation of RM as follows:

τ ¼ ¯RMðX 0; Y 0; Iq; t 0mÞ ¼Pk¼q

k¼q−2δ gðx 0k; y

0k; t

0mÞPk¼q

k¼q−2δ kx 0kk

: (31)

6 Computational ExperimentsWe have tested the tracking performance of the single agentQ-learning and multiagent Q-learning algorithms using theevolutionary adaptive eye tracking framework. The EA mod-ule decides the phenotype algorithm structure and geneticcode using the training set of image sequences, where thephenotype algorithm structure of the preprocessing is deter-mined as the histogram equalization, and that of the pre-

processing is determined as binarization. Thus, the trackingalgorithm structure is determined as the histogram equaliza-tion in the preprocessing, binarization in the feature extrac-tion, the partial Hough transform, and the Kalman filter inthe EA module. That is, the phenotype algorithm structureis determined and represented as

Ψ ¼ ΨpreðHEÞ ≫ ΨFE½BNðθBNÞ�≫ ΨPHT½PHTðθPHT1; θPHT2; θPHT3; θPHT4Þ�≫ ΨKF½KFðθKF1; θKF2Þ�:

The noise parameters in the Kalman filter are fixed after theEA processing for simplicity, and the experiments mainlyfocus on the interactive adaptation of θBN, θPHT1, θPHT2,θPHT3, θPHT4. The pivots of subgenetic code are used inthe RL algorithm for optimal eye tracking control thresh-olds/parameters, and the RL parameters were determinedempirically. Extensive experiments have been carried outusing a Pentium 4 with 3 GHz and a Logitech C910 webcam.Eight volunteers participated in the experiments to investi-gate the effect of diverse image sequences with varyingimage quality. Face size is restricted to not be varied signifi-cantly from frame to frame, and the variance range ofdetected faces is between 128 × 128 and 256 × 256.

6.1 Data SetThe image qualities of the test video set is generated as rel-atively good, moderate, and bad to investigate the learningadaptation capability of the proposed method, where threeimage categories are defined as three different image con-texts. Here, image quality indicates not only the variation oflighting condition, but also that of the eye movement speedand pose. Some examples are given in Fig. 4. Each imagesequence has around 2000 image frames, among whichhalf of the image frames are used for GA training, andthe remaining frames were used for the test of eye trackingperformance.

We created seven videos characterized in terms of the illu-mination, eye movement speed, and pose movement speed asshown in Table 6, where each video has 1000 frames. In ourexperiments, the frames with blinking and facial expressionsare excluded in the performance evaluation since the aimof our eye tracking is focused on HCI applications where

Fig. 5 Illustrations of test videos with varying image quality in terms of illumination, eye movement, and face movement.




a user’s intentional eye movement with a positive attitudecan be assumed. Some portions of the test videos areillustrated in Fig. 5.

6.2 RL and Experimental Parameter DeterminationIn this subsection, the effect of the RL parameters and otherexperimental parameters such as the matching thresholdand the real-time matching threshold are analyzed and deter-mined empirically. The RL parameters ε ∈ ð0; 1� fromε-greedy policy,16 learning rate α ∈ ð0; 1� and discountingrate γ ∈ ð0; 1� from the action-value equation of Q-learning[see Eq. (28)] were investigated. Figure 6 shows average per-formance of the ε-greedy policy with different values ofε ¼ 0.0, ε ¼ 0.1, and ε ¼ 0.15, where the vertical and hori-zontal axes are real-time tracking confidence and imageframe sequence, respectively. These data are the averagesover ten volunteers’ videos, where α and γ are set to 0.2and 0.95, respectively. The parameter ε affects the ratiobetween exploitation and exploration when a next action isdecided. One can notice that ε ¼ 0.15 can achieve faster andbetter convergence characteristics in real-time eye-trackingconfidence.

Figure 7 shows the effect on the eye-tracking performancewith different values of α ¼ 0.2, 0.1, 0.05, 0.3, and 0.01 byfixing ε ¼ 0.15 and γ ¼ 0.95. One can notice that α ¼ 0.2gives fast convergence characteristics in real-time trackingconfidence.

Figure 8 shows the effects of RL discounting rate withdifferent values, i.e., γ ¼ 0.9, 0.95, and 0.99 by setting ε ¼0.15 and α ¼ 0.2.

Table 6 Characteristics of videos in terms of the illumination, eyemovement speed, and pose movement speed.

IlluminationEye

movementPose

movement

Video 1 Bad-moderate-good Moderate Moderate

Video 2 Bad-bad-bad Slow Fast

Video 3 Good-good-good Fast Moderate

Video 4 Moderate-bad-good-bad-good Moderate Moderate

Video 5 Good-good-good Fast Moderate

Video 6 Good-good-good Slow Slow

Video 7 Moderate-moderate-moderate Moderate Fast

Fig. 6 The effects of greedy policy on the eye tracking performancewith different ε values, where the vertical and horizontal axes are real-time tracking confidence and image frame sequence, respectively.

Fig. 7 Effects of RL learning rate on the eye tracking performancewith α ¼ 0.2, 0.1, 0.05, 0.3, and 0.01.

Fig. 8 The effects of RL discount rate on the eye tracking perfor-mance with different values of γ ¼ 0.9, 0.95, and 0.99 by setting ε ¼0.15 and α ¼ 0.2.

Fig. 9 Real-time tracking confidence with different numbers of itera-tions at individual image frame (ε ¼ 0.15, α ¼ 0.2, γ ¼ 0.95, andδ ¼ 20).




Considering the real-time constraint, we test the effectof Q-learning step limitation per individual image frame.Figure 9 shows the real-time tracking confidences wherethe maximum number of Q-steps per frame is restricted to15, 20, and 30. As the iteration is reduced the tracking per-formance suffers, but the eye-tracking speed can beincreased.

Figure 10 shows the visualization of the real-time trackingconfidence with different values of δ ¼ 10, 20, and 30 bysetting ε ¼ 0.15, α ¼ 0.2, and γ ¼ 0.95. Since the suffi-ciently good convergence characteristics of the real-timetracking confidence can be observed starting from δ ¼ 20,we selected δ ¼ 20 considering the time overhead.

Figure 11 shows the rates of the false positive and falsenegative with different values of real-time matching thresh-old t 0m ¼ 0.575, 0.6, 0.625, 0.65, and 0.675 by settingε ¼ 0.15, α ¼ 0.2, and γ ¼ 0.95. We decide t 0m as 0.625since the matching threshold can provide approximatelyequal values of the false positive and false negative.

The experiments are carried out using the RL parametersof ε ¼ 0.15, α ¼ 0.2, and γ ¼ 0.95, and the experimentalparameters δ ¼ 20 in the following discussions.

6.3 Experimental ResultsExtensive experiments were carried out using various imagesequences. The performance was estimated using the seventypes of videos gathered from different environments. TheEA module decided the pivot of the phenotype controlparameters, and the discrete phenotype ranges are deter-mined and used as the RL exploration ranges. While largeRL ranges are expected to give high performance, theymight require much computation overhead, and the frameprocessing rate will be decreased, i.e., many input framesare skipped. If the frame rate for eye tracking is reducedtoo much, the continued variation assumption of illuminationmight not be valid. The algorithm structures of preprocessingand feature extraction are decided as the binarization and thehistogram equalization in the EA module, respectively. TheKalman filter parameters were fixed for the simplicity ofanalysis. We focused on the adaptation functionalities ofthe feature extraction threshold, i.e., the binarization thresh-old and the four partial Hough transform parameters whichmainly affect the eye tracking performance. We examinedthe trade-off the performance and computation overheadby adjusting the maximum number of Q-learning stepsper an image frame. Table 7 shows the performance

Fig. 10 The effect of different values of the parameter δ in visualiza-tion of the real-time tracking confidence (ε ¼ 0.15, α ¼ 0.2, andγ ¼ 0.95).

Fig. 11 The effect of matching threshold of real-time tracking confi-dence. FP and FN indicate the false positive and false negative,respectively.

Table 7 Comparison of the number of successfully tracked frames by non-adaptation, EA-only, and a combination of EA and RL methods withone-step Q-learning.

Imagesequence

FramesA∕B

Non-adaptationCjfps

EA onlyCjfps

EA� RL�S;15�Cjfps



Video 1 825∕1000 524j63.4 557j62.5 754j33.0 774j31.8 790j29.4

Video 2 948∕1000 734j63.2 789j63.1 895j45.0 916j43.8 937j43.0

Video 3 973∕1000 631j63.2 665j62.9 835j27.0 855j24.3 870j19.4

Video 4 976∕1000 535j57.4 701j57.2 891j39.7 922j37.1 938j33.6

Video 5 964∕1000 631j64.2 702j63.9 778j20.9 788j17.8 827j16.6

Video 6 983∕1000 812j54.6 826j53.8 919j39.3 934j37.0 944j35.1

Video 7 992∕1000 725j62.0 792j61.7 863j30.0 893j27.3 926j25.0Note: A, B, and C indicate the numbers of the successful eye area detection, whole test video frames, and successful eye tracking, respectively.The failed eye detection can be calculated by the equation (B − A). The number of tracking failures after the successful eye area detection can becalculated by the equation (A − C); EAþ RLðS; kÞ means the single-agent Q-learning with maximum k time-steps per an image frame.




comparisons of the proposed method using the single agentQ-learning to the nonadaptation and the adaptation using theEA-only method. In the non-adaptation method, the eye-tracking scheme uses predetermined threshold and controlparameters. In the EA-only method, the genetic code decidedby training data is used as the threshold and the controlparameters. In the combination of EA and RL methods,the RL is applied to decide the control threshold/parametersin the phenotype control space using the pivot values deter-mined by the EA. The real-time computation overhead of theEA-only method is almost equal to the non-adaptationmethod since the EA evolution is carried out before theeye tracking is performed.

Table 8 shows the trade-off between eye-tracking perfor-mance and computation overhead of the proposed multi-

agent Q-learning approach by changing the computationtime constraints. Note that the multiagent Q-learning canimprove eye-tracking performance while increasing compu-tation overhead.

The performance evaluations using precision, recall, har-monic means, and average tracking rate are given in Table 9for the one-step Q-learning using a single agent and inTable 10 for the multiagentQ-learning, respectively. Table 10shows that EAþ RL with 30 Q-learning steps ½EAþELðS; 30Þ� achieved the best performance, however, itrequires the highest computation overhead. EAþ RLðS; 20Þachieved moderate performance with moderate computationoverhead.

The multiagent Q-learning could achieve better perfor-mance than the one-step Q-learning, while adding more

Table 8 Comparison of the number of successfully tracked frames with different computation time constraints using three-agent Q-learning.

Imagesequence

FramesA∕B

EA� RL�M ;15�Cjfps



Video 1 825∕1000 810j13.2 813j12.2 817j11.8

Video 2 948∕1000 937j18.1 941j16.8 945j17.2

Video 3 973∕1000 893j10.8 898j9.3 903j7.8

Video 4 976∕1000 951j15.6 953j14.2 956j13.5

Video 5 964∕1000 843j8.36 845j6.8 858j6.7

Video 6 983∕1000 951j15.7 959j14.5 957j14.0

Video 7 992∕1000 942j12.0 952j10.5 957j10.0Note: A, B, and C indicate the number of the successful eye area detection, that of whole test video frames, and that of successful eye tracking,respectively. The failed eye detection can be calculated by the equation (B − A). The number of tracking failures after the successful eye areadetection can be calculated by the equation (A − C); EAþ RLðM; kÞ indicates the multi-agent Q-learning with maximum k time-steps per frame forindividual agent. Three agents are employed in this experiment.

Table 9 Evaluation of one-step Q-learning using performance measure precision, recall, harmonic mean, integrated tracking performance,and real-time tracking confidence.

Imagesequence

No-adaptationPRjREjHMjAR

EA –onlyPRjREjHMjAR

EA� RL�S;15�PRjREjHMjAR



Video 1 0.89j0.64j0.73j0.63 0.91j0.64j0.75j0.76 0.92j0.85j0.89j0.91 0.93j0.88j0.90j0.93 0.93j0.89j0.91j0.95







Mean 0.87j0.63j0.73j0.69 0.90j0.64j0.74j0.76 0.89j0.79j0.84j0.89 0.92j0.83j0.86j0.91 0.91j0.84j0.87j0.93Note: Q-learning parameters are set to ε ¼ 0.15, α ¼ 0.2, γ ¼ 0.95, and δ ¼ 20; RLðS; kÞ means the single agent Q-learning with the maximumiterations constrained by k times; PR indicates the precision, RE the recall, HM the harmonic mean, AR the average tracking rate; EAþ RLðS;20Þindicates the combination method using the single agent Q-learning with the constraint of Q-learning steps in each image frame as 20, and so on.




computation overhead. In Table 10, the three-agent Q-learn-ing with different time constrains are investigated where thetotal number of Q-learning steps per an image frame of eachagent is restricted from 15, 20, and 30.

The proposed method can guarantee optimal real-timeperformance by compromising the trade-off between the per-formance and the computation overhead. The tradeoffs of

execution speed and tracking performance among differentsystem architectures are compared in Fig. 12 using the aver-age tracking rate, precision, recall, and harmonic mean,respectively. The horizontal axes indicate the time overheadusing fps. Q-learning parameters are set to ε ¼ 0.15, α ¼0.2, γ ¼ 0.95, and δ ¼ 20. Notice that higher performancecan be obtained if the time overhead is increased.

Table 10 Evaluation of the multiagent Q-learning using the performance measures of precision, recall, harmonic mean, integrated trackingperformance, and real-time tracking confidence.

Imagesequence

EA� RL�M ;15�PRjREjHMjAR



Video 1 0.93j0.90j0.91j0.98 0.93j0.91j0.92j0.98 0.93j0.92j0.92j0.99

Video 2 0.94j0.94j0.94j0.99 0.94j0.94j0.94j0.99 0.95j0.94j0.95j0.99

Video 3 0.95j0.86j0.90j0.92 0.95j0.87j0.91j0.92 0.96j0.88j0.92j0.92

Video 4 0.77j0.75j0.76j0.97 0.77j0.75j0.76j0.97 0.78j0.76j0.77j0.97

Video 5 0.87j0.73j0.80j0.87 0.88j0.75j0.81j0.87 0.88j0.76j0.81j0.89

Video 6 0.94j0.91j0.93j0.96 0.94j0.92j0.93j0.97 0.94j0.92j0.93j0.97

Video 7 0.96j0.86j0.90j0.94 0.96j0.80j0.91j0.95 0.97j0.89j0.92j0.96

Mean 0.90j0.85j0.88j0.95 0.91j0.85j0.88j0.95 0.92j0.87j0.89j0.96Q-learning parameters are set to ε ¼ 0.15, α ¼ 0.2, γ ¼ 0.95, and δ ¼ 20; EAþ RLðM; kÞ means the multiple agents Q-learning with maximumiterations constrained by k times. Three agents are employed in this experiment.

Fig. 12 The tradeoffs of the execution speed and the tracking performance among different system architectures whereQ-learning parameters areset to ε ¼ 0.15, α ¼ 0.2, γ ¼ 0.95, and δ ¼ 20: (a) the average of real-time eye tracking, (b) the precision, (c) the recall, and (d) the harmonic mean.The horizontal axes indicate the time overhead using fps and AR indicates the average tracking rate, PR the precision, RE the recall, and HMthe harmonic mean.




7 Concluding RemarksCombining the EA approach and the RL algorithm, i.e., thedifficulty of the EA in a real-time adaptation and the curse ofhuge search space of the RL algorithm, the proposed methodcan manage to achieve the performance of the eye-trackingscheme in terms of accuracy and speed. The state-of-the-artof eye-tracking techniques relies on either high cost and highquality cameras/equipment or a strongly controlled situation.To overcome limitations, possible configurations of the eye-tracking scheme are explored in the genotype and phenotypecontrol spaces by the EA and the RL, respectively. The EA isresponsible for exploring the genotype control space using acollected training image set, and the RL algorithm organizesan evolved genotype into a reactive phenotype that optimizesthe eye-tracking scheme in real-time performance. The EAencodes and stores the learned optimal scheme in terms of agenetic code which denotes an eye-tracking algorithm struc-ture and associated thresholds/parameters. The RL algorithmdefines its internal environmental states in a phenotypecontrol space and mainly performs interactive learning andadaptation in real-time.

The main contribution of the proposed method comparedto other popular adaptive systems is to provide a design tech-nique that can manage an intrinsic uncertainty of visionbased HCI into a tractable computation space for controllingthe algorithm structure and associated thresholds/parameters.It can optimize the performance of low-cost eye-trackingschemes in accordance with a changing environment. Theproposed method defines the internal RL environment con-necting with the genotype control space instead of the inputimage directly as other RL environment21 so that the inter-active characteristics of the RL algorithm are fully utilized byexploring the threshold/parameter space influenced by achanging external environment. Since the main focus ofthe proposed method is to optimize an eye-tracking systemwith acceptable computation resources, we discussed usinga relatively simple eye-tracking method. However, theproposed framework can be extended to other eye trackingmethods encountering difficulties in achieving optimal perfor-mance due to unstable operational environments. The futuredirection of our research is to apply the proposed method in achallenging vision-based HCI application for next-generationcomputing such as eye cursors, eye keyboards, eye mice, andvision-based emotion detection.

AcknowledgmentsThis work was supported by an Inha University researchgrant. And following are results of a study on the “Leadersin Industry-university Cooperation” Project, supported bythe Ministry of Education, Science & Technology and InhaUniversity.

References

1. D. W. Hansen et al., “In the eye of the beholder: a survey of models foreyes and gaze,” IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 478–500(2010).

2. A. T. Duchowski, “A breadth-first survey of eye-tracking applications,”Behav. Res. Methods Instrum. Comput. 34(4), 455–470 (2002).

3. A. Sein et al., “Eyeing a real-time human-computer interface to assistthose with motor disabilities,” IEEE Potentials 27(3), 19–25 (2008).

4. C. A. Hennessey and P. D. Lawrence, “Improving the accuracy andreliability of remote system-calibration-free eye-gaze tracking,” IEEETrans. Biomed. Eng. 56(7), 1891–1900 (2009).

5. M. Nabati and A. Behrad, “Camera mouse implementation using 3Dhead pose estimation by monocular video camera and 2D to 3D

point and line correspondences,” in Proc. IEEE 5th Int. Symposiumon Telecommunications, pp. 825–830, IEEE, Tehran, Iran (2010).

6. L. J. G. Vazquez, M. A. Minor, and A. J. H. Sossa, “Low-cost humancomputer interface voluntary eye movement as communication systemfor disabled people with limited movements,” in Proc. IEEE PanAmerican Health Care Exchanges, pp. 165–170, IEEE, Rio de Janeiro(2011).

7. J. S. Agustin et al., “Low-cost gaze interaction: ready to deliver thepromises,” in Proc. CHI’09 Extended Abstracts on Human Factorsin Computing Systems, pp. 4453–4458, ACM, New York (2009).

8. B. Bhanu and J. Peng, “Adaptive integrated image segmentation andobject recognition,” IEEE Trans. Syst. Man Cybern. C Appl. Rev.30(4), 427–441 (2000).

9. D. P. Muni, N. R. Pal, and J. Das, “A novel approach to design clas-sifiers using genetic programming,” IEEE Trans. Evol. Comput. 8(2),183–196 (2004).

10. J. H. Holland, “Genetic algorithms,” Sci. Am. 267(1), 66–72 (1992).11. J. Clune et al., “Evolving coordinated quadruped gaits with the Hyper-

NEAT generative encoding,” in Proc. IEEE Congress on EvolutionaryComputation, pp. 2764–2771, IEEE, Trondheim, Norway (2009).

12. C. J. C. H. Watkins, “Learning from delayed rewards,” PhD Thesis,King’s College (1989).

13. K. Yanai and H. Iba, “Multi-agent robot learning by means of geneticprogramming: solving an escape problem,” in Proc. 4th Int. Conf.Evolvable Syst. From Biology to Hardware, Lecture Notes in ComputerScience, Vol. 2210, pp. 192–203, Springer-Verlag, Berlin, Heidelberg(2001).

14. S. Kamio and H. Iba, “Adaptation technique for integrating genetic pro-gramming and reinforcement learning for real robots,” IEEE Trans.Evol. Comput. 9(3), 318–333 (2005).

15. M. A. Wiering and H. V. Hasselt, “Ensemble algorithms in reinforce-ment learning,” IEEE Trans. Syst. Man Cybern. B Cybern. 38(4),930–936 (2008).

16. R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction,MIT Press, Cambridge, Massachusetts (1998).

17. N. Mastronarde and M. V. D. Schaar, “Online reinforcement learningfor dynamic multimedia systems,” IEEE Trans. Image Process. 19(2),290–305 (2010).

18. J. Valasek et al., “Improved adaptive reinforcement learning control formorphing unmanned air vehicles,” IEEE Trans. Syst. Man Cybern. BCybern. 38(4), 1014–1020 (2008).

19. H. Sellahewa and S. A. Jassim, “Image-quality-based adaptive facerecognition,” IEEE Trans. Instrum. Meas. 59(4), 805–813 (2010).

20. E. J. Koh, M. Y. Nam, and P. K. Rhee, “A context-driven Bayesianclassification method for eye location,” in Proc. 8th Int. Conf. Adaptiveand Natural Computing Algorithms, Lecture Notes in ComputerScience, Vol. 4432, pp. 517–524, Springer-Verlag, Berlin, Heidelberg(2007).

21. J. Peng and B. Bhanu, “Delayed reinforcement learning for adaptiveimage segmentation and feature extraction,” IEEE Trans. Syst. ManCybern. C Appl. Rev. 28(3), 482–488 (1998).

22. P. K. Rhee, M. Y. Nam, and L. Wang, “Pupil location and movementmeasurement for efficient emotional sensibility analysis,” in Proc. IEEEInt. Symp. on Signal Process. and Inform. Technol., pp. 1–6, IEEE,Luxor (2011).

23. J. Cauchie et al., “Optimization of an Hough transform algorithm for thesearch of a center,” Pattern Recognit. 41(2), 567–574 (2008).

24. M. S. Arulampalam et al., “A tutorial on particle filters for online non-linear/non-Gaussian Bayesian tracking,” IEEE Trans. Signal Process.50(2), 174–188 (2002).

25. D. J. Jobson, Z. Rahman, and G. A. Woodell, “A multi-scale retinex forbridging the gap between color images and the human observation ofscenes,” IEEE Trans. Image Process. 6(7), 965–976 (1997).

26. M. Y. Nam and P. K. Rhee, “An efficient face recognition for variantillumination condition,” in Proc. IEEE Int. Symp. on Intell. Signal Proc-ess. and Commun. Syst., pp. 111–115, IEEE, Incheon, South Korea(2005).

27. J. A. Canny, “Computational approach to edge detection,” IEEE Trans.Pattern Anal. Mach. Intell. PAMI-8(6), 679–698 (1986).

28. R. E. Kalman, “A new approach to linear filtering and prediction prob-lems,” J. Basic Eng. 82(1), 35–45 (1960).

29. G. Welch and G. Bishop, “An Introduction to the Kalman Filter,” UNC-Chapel Hill, TR95-041 (2000), http://www.cs.unc.edu.

30. Q. Chen et al., “Two-stage object tracking method based on kernel andactive contour,” IEEE Trans. Circuits Syst. Video Technol. 20(4)605–609 (2010).

31. M. R. Rajamani and J. B. Rawlings, “Estimation of the disturbancestructure from data using semidefinite programming and optimalweighting,” Automatica 45(1), 142–148 (2009).

32. M. S. Grewal and A. P. Andrews, Kalman Filtering Theory andPractice, Prentice Hall, Upper Saddle River, New Jersey (1992).

33. M. Abdel-Mottaleb and M. H. Mahoor, “Application notes—algorithmsfor assessing the quality of facial images,” IEEE Comput. Intell. Mag.2(2), 10–17 (2007).




http://dx.doi.org/10.1109/TPAMI.2009.30

http://dx.doi.org/10.3758/BF03195475

http://dx.doi.org/10.1109/MPOT.2007.914733

http://dx.doi.org/10.1109/TBME.2009.2015955

http://dx.doi.org/10.1109/TBME.2009.2015955

http://dx.doi.org/10.1109/5326.897070

http://dx.doi.org/10.1109/TEVC.2004.825567

http://dx.doi.org/10.1038/scientificamerican0792-66



http://dx.doi.org/10.1109/TSMCB.2008.920231

http://dx.doi.org/10.1109/TIP.2009.2035228



http://dx.doi.org/10.1109/TIM.2009.2037989

http://dx.doi.org/10.1109/5326.704593

http://dx.doi.org/10.1109/5326.704593

http://dx.doi.org/10.1016/j.patcog.2007.07.001

http://dx.doi.org/10.1109/78.978374

http://dx.doi.org/10.1109/83.597272



http://dx.doi.org/10.1115/1.3662552

http://www.cs.unc.edu




http://dx.doi.org/10.1109/TCSVT.2010.2041819

http://dx.doi.org/10.1016/j.automatica.2008.05.032

http://dx.doi.org/10.1109/MCI.2007.353416

34. Q. Li and Z. Wang, “Reduced-reference image quality assessment usingdivisive normalization-based image representation,” IEEE J. Sel. TopicsSignal Process. 3(2), 202–211 (2009).

35. S. Ross et al., “A Bayesian approach for learning and planning in par-tially observable Markov decision processes,” J. Mach. Learn. Res.12(5), 1729–1770 (2011).

36. L. Busoniu et al., “A comprehensive survey of multiagent reinforce-ment learning,” IEEE Trans. Syst. Man Cybern. C Appl. Rev. 38(2),156–172 (2008).

37. S. Singh, “Intrinsically motivated reinforcement learning: an evolution-ary perspective,” IEEE Trans. Autom. Mental Devel. 2(2), 70–82(2010).

38. L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcementlearning: a survey,” J. Artif. Intell. Res. 4(1), 237–285 (1996).

39. C. Wolf and J.-M. Jolion, “Object count/area graphs for the evaluationof object detection and segmentation algorithms,” Int. J. Doc. Anal.8(4), 280–296 (2006).

40. Z. Kadal et al., “Tracking-learning-detection,” IEEE Trans. PatternAnal. Mach. Intell. 34(7), 1409–1422 (2012).

41. T. Fawcett, ROC Graphs: Notes and practical Considerations forResearchers, Kluwer Academic Publishers, The Netherlands (2004).

Yan Shen received her BS degree in com-puter science from Inha University, Incheon,Republic of Korea, in 2011. She is currentlypursuing the master’s degree at Inha Uni-versity, where she is majoring in computerscience and engineering. Her research inter-ests include recommender systems, com-puter vision, intelligent computers, and HCI.

Hak Chul Shin received his BS degree incomputer science from the Inha University,Incheon, Republic of Korea, in 2010. He iscurrently pursuing the master’s degree atInha University where he is majoring in com-puter science and engineering. His researchinterests include imaging processing, objectdetection, and trajectory analysis.

Won Jun Sung received his BS degree incomputer science from the Kongju Univer-sity, Cheonan, Republic of Korea, in 2012.He is currently pursuing the master’s degreeat Inha University, where he is majoring incomputer science and engineering. Hisresearch interests include imaging process-ing, object detection, and face recognition.

Sarang Khim is currently pursing his BSdegree in computer and information engi-neering from Inha University, Incheon,Republic of Korea, in 2012. His researchinterests include pattern computer vision,intelligent technology, and artificial intelli-gence.

Honglak Kim received his BS degree incomputer and information engineering fromInha University, Incheon, Republic of Korea,in 2012. His research interests includepattern recognition, machine learning, andcomputer vision.

Phill Kyu Rhee received his BS degree inelectrical engineering from the Seoul Univer-sity, Seoul, Republic of Korea, his MS degreein computer science from the East TexasState University, Commerce, Texas, and hisPhD degree in computer science from theUniversity of Louisiana, Lafayette, Louisiana,in 1982, 1986, and 1990, respectively. During1982 to 1985, he was working in the SystemEngineering Research Institute, Seoul,Republic of Korea, as a research scientist.

In 1991, he joined the Electronic and Telecommunication ResearchInstitute, Seoul, Republic of Korea, as a senior research staff member.Since 1992, he has been an associate professor in the Department ofComputer Science and Engineering of the Inha University, Incheon,Republic of Korea, and since 2001, he is a professor in the samedepartment and university. His current research interests are patternrecognition, machine intelligence, and autonomic cloud computing.He is a member of the IEEE Computer Society and Korea InformationScience Society.




http://dx.doi.org/10.1109/JSTSP.2009.2014497

http://dx.doi.org/10.1109/JSTSP.2009.2014497

http://dx.doi.org/10.1109/TSMCC.2007.913919

http://dx.doi.org/10.1109/TAMD.2010.2051031

http://dx.doi.org/10.1007/s10032-006-0014-0



Documents

Evolutionary adaptive eye tracking for low-cost …...Evolutionary adaptive eye tracking for low-cost human computer interaction applications Yan Shen Hak Chul Shin Won Jun Sung Sarang