Upload
philomena-powers
View
218
Download
3
Tags:
Embed Size (px)
Citation preview
From Head Pose to Focus of Attention - Rainer Stiefelhagen
From Head Pose to Focus of Attention
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Overview
• Motivation• Face Tracking
• Head Pose Tracking– A model-based approach– An appearance-based approach
• From Head Pose to Focus of Attention• Some Applications
• Outlook
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Why Tracking Focus of Attention ?
• FoA in Meetings: – To determine the addressee of a speech act
– To determine to whom a person is listening
– To understand the dynamics of interaction
– For meeting indexing / retrieval
• FoA for Context-Aware Interaction: – To know with what a user is interacting
(TV, fridge, household robot, …?)
– To know whether a user is aware of something
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Gaze and Attention
• Gaze is a good indicator of attention:– We look at people to whom we talk or listen
– We look at objects that are of interest to us
– We look at objects to whom we refer
– Looking at someone is perceived as a signal of attention
• Eye-gaze is difficult to track …– with changing number of users at changing locations
• Head Orientation is a sufficient cue in many cases
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Tracking of Human Faces
• A face provides different functions:• identification• perception of emotional expressions
• Human Computer Interaction requires tracking of faces:• lip-reading• gaze tracking• facial action analysis / synthesis
• Video Conferencing / video telephony application:• tracking the speaker• achieving low bit rate transmission
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Color Based Face Tracking
Human skin-colors:• cluster in a small area of a color space• skin-colors of different people mainly differ in intensity!• variance can be reduced by color normalization• distribution can be characterized by a Gaussian model
BGR
Rr
BGR
Gg
Chromatic colors:
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Color Model
Advantages:• very fast• orientation invariant• stable object representation• not person-dependent• model parameters can be quickly adapted
Disadvantages:• environment dependent • (light-sources heavily affect color distribution)
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Tracking Gaze and Focus of Attention
• In meetings:– to determine the addressee of a speech act – to track the participants attention– to analyse, who was in the center of focus– for meeting indexing / retrieval
• Interactive rooms – to guide the environments focus to the right application– to suppress unwanted responses
• Virtual collaborative workspaces (CSCW)• Human-Robot Cooperation • Cars (Driver monitoring)
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Tracking a User’s Focus of Attention
• Focus of Attention tracking:– To detect a person’s interest
– To know what a user is interacting with
– To understand his actions/intentions
– To know whether a user is aware of something
• In meetings:– to determine the addressee of a speech act
– to understand the dynamics of interaction
– for meeting indexing / retrieval
• Other areas– Smart environments
– Video-conferencing
– Human-Robot Interaction
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Attention during Social Interaction
• “Looking Means Listening!” (Ruusuvuori 2000)
• Attentional signals:– Eye gaze– Head orientation– Body posture– Gestures
• Head Orientation is a good cue to predict focus of attention !
• Eye-gaze is difficult to track …
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Typical Eye-Gaze Trackers
• Head mounted systems • user is allowed to move (limited by wires, etc.)• accurate head pose and eye-gaze (< 1 degree error)• fast (> 60 Hz possible)
• Very intrusive !
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Eye Gaze Trackers
Remote Trackers:
• User‘s movements are very limited• Problems with head rotations
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Eye Gaze Tracking
• Problems:– Requires the user to wear head gear– or position of the user relative to the
camera is rather fixed
– Certainly not acceptabel for everyday use in multimodal rooms
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Head Pose Estimation
• Model-based approaches:– Locate and track a number of facial features– Compute head pose from 2D to 3D correspondences (Gee
& Cipolla '94, Stiefelhagen et.al '96, Jebara & Pentland '97,Toyama '98)
• Example-based approaches:– estimate new pose with function approximator (such as
ANN) (Beymer et.al.'94, Schiele & Waibel '95, Rae & Ritter '98)
– use face database to encode images (Pentland et.al. '94)
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Model-based Head Pose estimation
Image 3D Model Real World
Y
Z
X
Feature Tracking Pose Estimation
•Find correspondences between points in a 3D model and points in the image
• Iteratively solve linear equation system to find pose parameters (rx, ry, rz, tx, ty, tz)
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Real-Time Facial Feature Tracking
Top Down Approach:• searching the face by the face tracker • searching and tracking of 6 facial feature points: pupils, nostrils and lip corners
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Real-Time Facial Feature Tracking
Searching pupils• searching for 2 dark regions• anthropometric constraints certain search area within found face• iterative to adapt to different lighting conditions
Apply iterative thresholding on grayscale image
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Real-Time Facial Feature Tracking
Searching Lip Corners:• predict search window, using eye position and face-model• find vertical position of the line between the lips (using horizontal integral projection)• find horizontal boundaries using edge detection and vertical integral projection
Searching Nostrils:• search area restricted to area between eyes and lips• same technique as used to find eyes
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Demo / Video
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Demo: Facial Feature Tracking
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Demo: Model-based Head Pose
From Head Pose to Focus of Attention - Rainer Stiefelhagen
System at Daimler Chrysler
DA A. King, System from Tilo Schwarz (Daimler-Chrysler, Ulm)
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Model-based Head Pose
• Pose estimation accuracy depends on correct feature localization!
• Problems:– Choice of good features– Occlusion due to strong head rotation– Fast head movement– Detection of tracking failure / re-initialization– Requires good image resolution
• Video
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Estimating Head Pose with ANNs
• Train neural network to estimate head orientation
• Preprocessed image of the face used as input
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Network Architecture
Hidden Layer:40 to 150 units
Pan (Tilt)
Input Retina: up to 3 x 20x30 pixel 1.800 units
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Image Preprocessing• Automatic extraction of faces
– using a skin-color model
• Preprocessing a) Histogram normalization of grayscale images
b) Extracting horizontal- and vertical edges– Down-sampling to 20x30 pixel
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Histogram Transformation
possible target functions:
grayvalue distribution:
accumulated grayvalues:
• Mapping of greyvalues to a fixed distribution • expands the range of intensities in the window• compensates for different camera gains• improves contrast (for gray levels near histogram maxima)
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Visual Preprocessinggrayvalue modification - example histogram :
grayvalue new:)´(
functionon modificati:
grayvalue original:)(
))(()´(
pf
T
pf
pfTpf
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Estimating Head Pose with ANNs
• Estimate head pan and tilt from the facial image• Data Collection:
– perspective views of users are collected with pano-cam– labels from a magnetic field pose tracker– data from fourteen users, at four positions around table
collected
From Head Pose to Focus of Attention - Rainer Stiefelhagen
ANN: Training / Results
• Training on 6100 images from 12 users (+mirrored images)
• Crossevaluation on 750 images from same 12 users
• Tested on 750 images from these users• Additional User Independent Testset: 1500 images from two new
users
training set test set new usershisto 4.8 / 3.6 5.5 / 4.1 10.4 / 9.3edges 4.3 / 2.9 5.6 / 3.5 12.2 / 10.3both 2.1 / 2.1 3.1 / 2.5 9.5 / 9.8
Average Error in degrees for pan / tilt
From Head Pose to Focus of Attention - Rainer Stiefelhagen
FoA Tracking in Meetings
How:1. Track all participants‘ faces2. Estimate their head orientations (ANNs)3. Map head orientations onto likely targets (e.g.
the other participants)
Why:
– to determine the addressee of a speech act
– to track the participants attention
– to analyse, who was in the center of focus
– for meeting indexing / retrieval), ...
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Focus of Attention Tracking in Meetings
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Tracking People in a Panoramic View
Camera View Panoramic View
PerspectiveView
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Demo
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Modeling Focus from Head Orientation
)(
)()|()|(
xp
TFocusPTFocusxpxTFocusP S
S
x: head pan in degrees, T {Person 1, Person 2, …, Person M }
• Head Orientation is a strong indicator of social attention• Eye-Gaze is difficult to measure
Estimate a persons focus of attention based on his head orientation
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Head Pan Distributions p(x|Fi)
Person 1 Person 3Person 2
• Head pan distributions are dependent on
- personal head-turning “styles”- location of targets
• Distributions should be adapted for- each person- each meeting
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Adaptation of the Model
All observations: p(x) Class-cond. Distr. p(x|F) Posteriors P(F|x)
• P(x) is modeled as mixture of Gaussians:
M
j
jPjxpxp1
)()|()(
• Model parameters are found using EM-approach• Individual Gaussian components are used as class-conditionals• Priors of the mixture model P(j) are use as focus prior P(F = T)
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Focus based on Head Orientation
P(Focus | Gaze)
Meeting A (4 participants) 68.8 % Meeting B (4 participants) 73.4 % Meeting C (4 participants) 79.5 % Meeting D (4 participants) 69.8 %
Avg. 72.9 %
Percentage of correctly assigned focus targets based on computing P(Focus | Head pan)
• Results on four meetings– 4 participants in each meeting– Participants automatically detected and tracked– unsupervised adaptation of model parameters
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Head Orientation and Eye Gaze in Meetings
• Head orientation and eye gaze point to the same direction 87% of the time
• Head orientation accounted for 69 % of the overall horizontal gaze direction
• Focus can be predicted based on head orientation in 88.7 % of the frames (using our model ...)
Experiment: Four people in a meeting, head orientation and eye-gaze measured using ISCAN head-eye tracker
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Head Orientation and Eye Gaze in Meetings (2)
Histograms of line of sight and head orientation:
Plot of horizontal orientation of head, eye and line-of sight:
Person 1 Person 2
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Head Orientation and Eye Gaze in Meetings
• Four people in a meeting, head orientation and eye-gaze measured using ISCAN head-eye tracker
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Head Orientation and Eye Gaze in Meetings (3)
• Focus can be predicted based on head orientation in 88.7 % of the frames (using our model ...)
Histograms of line of sight and head orientation:
Person 1 Person 2
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Contribution of Head Orientation
Subject #FramesEye
BlinksSame
directionHead
Contribution
1 36003 25 % 83 % 62 %
2 35994 23 % 80 % 53 %
3 38071 19 % 92 % 64 %
4 35991 20 % 93 % 97 %
Average 22 % 87 % 69 %
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Focus from Head Orientation
Adapt Mixture Modelto Head-Orientation Data.
Use Componentsas Class-Conditionals
P(Target | Head Pan)
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Focus Detection Results
Subject Accuracy
1 86 %
2 83 %
3 93 %
4 93 %
Average 89 %
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Predicting Focus from Sound
• Idea: People tend to look at the speaker– To signal attention– To speech-read– To monitor gestures, facial expressions, etc.
• Use information about who is speaking, to estimate where each participant is looking at:
P(Focus |”Audio”)
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Predicting Focus from Sound (2)
• Finding P(F|At) by counting focus targets on training data for each “speaker case”
At=(S,L,C,R)
Left Straight Right
0 0 0 0 0.26 0.49 0.23
0 0 0 1 0.11 0.27 0.60
0 0 1 0 0.12 0.74 0.11
0 0 1 1 0.07 0.49 0.40
0 1 0 0 0.59 0.28 0.11
... ... ... ...
1 1 0 1 0.29 0.44 0.26
1 1 1 0 0.35 0.56 0.08
1 1 1 1 0.50 0.50 0.00
Focus-predicition: 56.4 % accuracy (four meetings)
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Predicting Focus from Sound (3)
• Short-time “speaker-history” influences where people look at
predict P(F|At, At-1,…, At-N)
• ANN used to predict P(F|At, At-1,…, At-
N)
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Sound-based Focus with ANNs
• ANN used to predict P(F|At, At-1,…, At-N)• Training- and cross-evaluation sets taken round
robin
52
54
56
58
60
62
64
66
68
1 2 3 4 5 10 15 20 25 30 40
History length [frames]
Acc
urac
y [%
]
2 hidden units
3 hidden units
5 hidden units
10 hidden units
20 hidden units
Sound-based prediction on four meetings
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Predicting Focus from Sound (5)
• Counts on training data used to predict P(F| At )• ANN used to predict P(F|At, At-1,…, At-9)
P(F| At ) P(F|At,At-1,..,At-9)
Set A 57.7 63.0 Set B 57.6 67.2 Set C 46.9 60.2 Set D 63.2 72.6 Avg. 56.4 65.8
Audio-based Focus Prediction Results
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Combination of Gaze and Sound
Video only Audio only Combined (α=0.6)
Set A 68.8 63.0 71.4 Set B 73.4 67.2 77.1 Set C 79.5 60.2 80.5 Set D 69.8 72.1 75.7 Avg. 72.9 65.6 76.2
• Accuracy increases from 72.9 % to 76.2 %• 12 % relative error reduction using a fixed α
Combination: (1- α) P(Fi|Gaze) + α P(Fi|Sound)
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Applications: “Meeting Browser”
• Transcription• Summarization• Dialog Processing• Focus of Attention• Speaker + Face ID
From Head Pose to Focus of Attention - Rainer Stiefelhagen
FoA for Indexing and Analysis of Meetings
• Activity / Attention Analysis of Meetings– temporally filtering the focus / speaking activity
• Meeting statistics: – how often were persons paying attention to the speaker – how often did people talk ?– how often have people been payed attention to ?
• In a Meeting Browser– to get additional tags for retrieval– for focus-oriented replay of (parts of) meetings
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Applications: Human-Robot Interaction
• Focus of Attention tracking:– to understand the user’s
actions/intentions– to establish joint/shared
attention– to determine the addressee
of a speech act– to know when someone else
is addressed See: www.sfb588.uni-karlsruhe.de
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Speech-RecognitionDialogue-ProcessingTV-Database, …
Focus-Aware Human-Robot Interaction
• Demo-System: Interaction with a Household Robot, a voice-controlled VCR, Broadcast-News retrieval system and people in the room
Attention-
tracker
Speech-recognition, Parser, Dialog-Proc.,Speech-SynthesisAnimation of some actions
Other peopleNo action
?
?
From Head Pose to Focus of Attention - Rainer Stiefelhagen
Problems / Open Issues
• Robustness against changing illumination
• Handling of moving users, dynamic scenes
• Detection of new targets (objects, persons, ...)