Upload
britton-cummings
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Encoding, Recognizing , Retrieving and Predicting Complex Human Actions
(Computer Assisted Perception and Action Systems)
Terry Caelli Computing Science
University of Alberta, Canada
Some Current application areas using active and passive sensing
Agriculture: Cattle Feeding
Image Understanding: Forestry Inventory Systems
Understanding Human Navigation
Map revision
Computer-based Skill Acquisition & Skill Transfer
General Aim of CAPA
To integrate sensing, MultiMedia, Animation and Machine Learning for helping humans
perform practical tasks
Problems: • How do we sense and encode human actions?
• How do we train machines to learn human actions?
• How do we recognize and predict human actions?
• How do we transfer this training from machines to humans?
Active Magnetic Field Acoustic Laser
Passive Cameras
Sensors
Computer-based Skill Acquisition Skill Transfer
• Gesture recognition
MIT Media Labs, ex. Bobick, Brand, Pentland(1995-98)
Reviews: Kohler and Schroter(1998), Aggarwal and Cai(1999)
• Skill recognition
Skill Kuniyoshi et al(1993,94)
Construction Fritsch et al(2000-2002)
Mann et al(1996)
• Sensors, encoders
- Video - 2D region/feature extraction (ex eigenvalues of ellipses, colour - Inference of 3D from 2D features, indexing 3D CAD models
Past Work: mainly video-based
Three Current Stochastic Control Models
• Kalman Filters
Single gaussian predictor-corrector model
• Hidden Markov Models
Any pdfs apply but requires careful use of EM and Viterbi
• Particle Filters and Markov Chain Monte Carlo methods
Uses interest sampling approaches, generalizes HMMs, issues of priors, proposal, sampling distributions
WHAT IS LEFT TO DO?
• Apply to tasks that really have the potential to assist humans• Improve sensors• Improve encoding models• Improve learning/estimation, recognition and prediction methods• Develop more objective methods for evaluating models
THE ENCODING PROBLEM Encoding Human Kinematics
Each sensor or image feature records position, velocity and inferred forces, angles between joints, etc
Must be fully 3D or inferred 3D from sensed 2D data
For active sensors the basic signal is a contour trajectory
))(),(),(()( tztytxtX iiii Encoder options (2) Transform sets of sensor trajectories into inter-sensor angles, etc• Encode each sensor trajectory and their correlations <= used here
SHAPE: unique and invariant The Serret-Frenet equations
Being INTRINSIC properties they are NOT localized in space or in absolute orientation
ndb.
Curvature how the tangent (T) changeswith respect to ds
Torsion how binormal (B) changes relative to the normal (N):with respect to ds: “amount of screw”
dT = kdN
dN= -kdT +dB
dB = -dN
Encoding TRAJECTORY SHAPE
Curvature and torsion can be directly computed by
3)(
)()()()()()()()()()()()()(
tC
txtytytxtxtztztxtytztztyt
t
tttttttttttttttttt
)()()(
)()()(
)()()(
)()()(
)(222 tGtFtE
tztytx
tztytx
tztytx
t ttttttttt
tttt
ttt
)()(
)()()(
)()(
)()()(
)()(
)()()(
tytx
tytxtG
txtz
txtztF
tzty
tztytE
tttt
tt
tttt
tt
tttt
tt
where
Computing Invariant features
Past filters: Mokhtarian(1997) – gaussians NOT optimal
One optimal solution: Savitsky Golay (SG) FiltersOne optimal solution: Savitsky Golay (SG) Filters
A linear least squares filterA linear least squares filter
• Choose window (causal, non-causal)Choose window (causal, non-causal)• Choose order of polynomialChoose order of polynomial• Derives filter kernels to Derives filter kernels to best fit data to polynomialbest fit data to polynomial
Solution for coefficients: a=(ATA)-1(Af) = (ATA)-1A)f So we have a filter form (moving window)(ATA)-1(Aen)
A is “design” matrix:data x polynomial bases
Known properties •Robust to noise•Fits data using higher-order moments
•Determines derivatives analytically
MOST IMPORTANT: 2 scalesMOST IMPORTANT: 2 scales
window size and order of polynomial (moments) for solving linear least window size and order of polynomial (moments) for solving linear least square problem LOCALLY IN EACH MOVING WINDOWsquare problem LOCALLY IN EACH MOVING WINDOW
2
1
1
)(
min
N
i i
M
k
ikki
a
tXay
i
ijij
tXA
)(
Example: coefficients for window size +/- 4:
{0.04, -0.13, 0.07, 0.32, 0.42, 0.32, 0.07, –0.13, 0.04}
3D velocity and acceleration with respect to intrinsic arc length parameter u
dt
ds
ds
sdVtA
dt
ds
ds
sdXtV
)()(
)()(
Encoding Encoding TRAJECTORYTRAJECTORY DYNAMICSDYNAMICS
V
A
2D Curvature-torsion space2D Curvature-torsion space
2D V-A space2D V-A space
3D recorded action3D recorded action
Complete Invariant signature
SHAPE: k(u) and (u) for each point
2) DYNAMICS: V(u), A(u) for each point (total - NOT directional)
3) The initial position and direction difference vectors
Forward Model: Dynamics-to-Shape
Inverse Model: Shape-to-Dynamics
..can be multi-scaled with a symbolic representation
The Point Screw Decomposition ModelThe Point Screw Decomposition Model
Essential motion parsing idea Essential motion parsing idea
The helix X(t) = (acos(t), asin(t), bt) is uniquely defined by where =a/(a2 +b2); =b/(a2 + b2)
so, if we cluster/quantize {} signatures we are generating a screw approximation to a curve
Solutions to learning, recognition and prediction of screw sequences
Markov shape theory: stochastic differential geometry
How do we train machines to learn human actions?
How do we recognize and predict human actions?
dT = kdN
dN= -kdT +dB
dB = -dN
Sensory-motor programs for complex actions involve storing simply dynamical stochastic rules about screw action sequences which include what should be sensed (observed) during their executions
Key idea
))1(),1,()),(),((())1(),1(( tottAttSttS
An adaptive generalization version of the Serret-Frenet
Current Solutions - Cartesian Product (complete model) O(TN2C) - Structured Mean Field: Markov random field
compute over cliques (Ghahramani and Jordon 1996) O(TN2C*) - N-heads (Brand, 1997) : compute most likely” O(T(CN)2) - Weighted Marginals Model (Caelli et al, 2001, Zhong and Ghosh, 2001) O(T(CN))
The Problem: The causal model component )...,/( 211
Cttt
ct SSSSP
is exponential!
},..1;,..1);({
},,,{
iiu
iu NuNiSpwhere
BAC
for N HMM’s with Ni states (actions) for each HMM: the prior probability of each state of each HMM.
C corresponds to the coefficient of interaction A to the intra and inter state transition matrix
B to the state dependent observation probability matrix
)(/)1(( tStSpaA iu
jv
ijuv
t
t+1A
intra
inter B
)/()( iu
ik
iku SopobB
iuS )/( i
uik Sop
iko
WMM: Coupled hidden Markov model
WMM: Generalized Viterbi
))((]))((max)([maxarg)(
))((]))((max)([max)(
1);(maxarg)1(
))((])()([maxarg)(
))((])()([max)(
1;1;2
Re
)(
))1(()()(
1;1
11
11
,
11
,
11
1
11
tobacwacuv
tobacwacuv
Ttvtu
trackingBack
tobacwacuv
tobacwacuv
NvNjTtFor
cursion
ou
obuS
NuNiFor
iv
jiwvji
ij
jt
jw
iiuvii
itu
it
iv
jiwvji
ij
jt
jw
iiuvii
itu
it
itv
i
iv
jiwvji
ijw
jt
iiuvii
itu
it
iv
jiwvji
ijw
jt
iiuvii
itu
it
j
i
iu
iiu
i
Produces most likely sequences of states given the observations and CHMM model
N-Heads
Forward operator
In matrix form we have
)1(1 tbCAtt
The corresponding backward operator is
)(''1 tbACtt
t1 t2 tT
HMMi
HMMj
Forward
Backward
iT
ii ooo 11
jT
jj ooo 11
Produces an update of the CHMM model given the
set of observations
WMM Estimation: Generalized Baum Welch
))1(()()()(
))1(()()()(
:
))1(()/)1(()()()(
;;
1
;;
1
11
tobawcaucu
tobawcaucu
Induction
obSopSpuS
iv
jiw
ijuv
jtij
jiu
iiuv
itii
it
iv
jiw
ijuv
jtij
jiu
iiuv
itii
it
iu
iu
iu
iiu
iiu
WMM
N-Heads
Examples of tasksDrain
Spider
3 participants, 4 tasks
assembly, disassembly for “drain”, “spider” approximately 25 seconds duration each
repeated 10 times each
Observation sequences 230-260 in length
2 4 9 15 21 294
16
36
-0.050.000.050.100.150.200.250.300.350.40
Co
ns
tra
ine
d M
on
t C
arl
o
Ha
mm
ing
Dis
tan
ce
# of Observations
# of States
Task Recognition and Discrimination P(Hits)-P(False Alarms)
Deriving the best number of observation symbols and hidden states from task samples
Use of 16 observation symbols and 16 states
Recognizing actions
3 (participants) x 4 (tasks) => 12 Coupled HMM models
5 unseen new samples/participant/task => 60 tests
In all cases
log P(Viterbi/correct training data from 4 sensors) >
log P (Viterbi/incorrect test data from 4 sensors)
100% correct identification on each and every component HMM – simply implies subject performance and tasks were quite different!
Predicting actions Viterbi solution using Monte Carlo sampling and probability correct (PC) measure
Coupling coefficient within an arm: (C) 0.0 0.1 0.6 Training Test Training Test Training Test
AM 0.66 0.60 0.72 0.68 0.67 0.63Task 1 JF 0.67 0.63 0.75 0.70 0.68 0.66
TC 0.67 0.65 0.75 0.73 0.73 0.72
AM 0.63 0.62 0.71 0.71 0.71 0.70Task 2 JF 0.65 0.65 0.73 0.72 0.72 0.72
TC 0.67 0.65 0.73 0.71 0.70 0.68
AM 0.66 0.65 0.75 0.75 0.71 0.70Task 3 JF 0.65 0.68 0.73 0.74 0.70 0.68
TC 0.68 0.69 0.74 0.74 0.75 0.75
AM 0.62 0.61 0.68 0.70 0.69 0.69Task 4 JF 0.62 0.62 0.65 0.64 0.65 0.66
TC 0.62 0.58 0.70 0.70 0.67 0.68
Means: 0.65 0.60 0.67 0.71 0.65 0.69
• Long sequences – regimes problem?
• Wrong Model – CHMM too limited?
• Too much uncertainty?
• How to tell?
A12 A21
A1
A2
B1
B2
Typical model uncertainty
Why did the model correctly recognize but not predict as well as we wanted?
We need measures of model parameters with respect to
recognition and prediction
beyond MAP
Condition Number Residual analysis
BAba jkij . Plays key role in estimation and prediction
So, consider augmented matrix BA |
• Rows define state and observation attributes of each state
• Dependent rows indicate redundant states
• Compute inverse condition number: singular values
• Use residuals to delete, merge or split states/observations
Example
maxmin /
The model (NB: state priors from first eigenvector of A)
…may represent the data but has 0 inverse condition number!
5.5.
5.5.;
2.8.
2.8.BA
Conditional Entropy (Information Content)
Given a model and observation sequence we can compute:
(1) The Viterbi optimal state sequence given the complete model
(2) The optimal state sequence given the B matrix and priors:
)}()/({max)/( SpSOpOSp tS
tt Bayesian MAP Classifier
(3) Compute the conditional entropy from the two sets of state sequences
)/()()/(
)(),()/(
BVHVHBVR
BHBVHBVH
The amount of information in the Viterbi solution explained by the Bayesian classifier
The amount of information in the Viterbi solution not explained by the Bayesian classifier: The pure Markov component
Example – Recovery of 3D Hand movements from images
Almost never are the HMM model parameters published so we don’t know if the appropriate model is really a HMM or simply a Markov Chain or Bayesian classifier
5.5.
5.5.;
7.3.
2.8.BA
8.2.
3.7.;
5.5.
5.5.BA
Markov Chain
Bayesian ClassifierSo, for hand motion recognition/tracking NO HMM would use the Markov property with random movementsNO HMM would use the B matrix is features were ambiguous
Experiments
Deterministic Walk
Random Walk
Random Pose Poses:
5 pitch {-30,0,20,50,80}
5 roll {-90,-45,0,45,90}
4 Yaw {-20,-10,0,10}
100 possible poses
0.46 0.54 0.00 0.00 0.00
0.45 0.00 0.55 0.00 0.00
0.00 0.53 0.00 0.47 0.00
0.00 0.00 0.46 0.00 0.54
0.00 0.00 0.00 0.55 0.45
A = B =
0.74 0.26 0.00 0.00 0.0
0.90 0.10 0.00 0.00 0.0
0.67 0.33 0.00 0.00 0.0
0.89 0.05 0.04 0.03 0.0
0.86 0.14 0.00 0.00 0.0
Residuals = [ 0.4 0.4 0.5 0.4 0.4 0.1 0.1 0.0 0.0 0.0 ]States Observation symbols
= 0.13
After refinement - H(v): 2.32 H(v/b): 1.59 R(v / b): 0.73 69% of the information in the A matrix in predicting the optimal
state sequence.
MAP as a function of model uncertainty uncertainty
Initial model: Random
Initial Model: Deterministic
Shows how MAP breaks down as model becomes more random
Conclusions
(1) There are many uses for computer assisted perception and action systems
(2) Prototyping human actions can be used for teaching and assessing and transferring human skill, wellness, etc.
(3) Issues of sensors, encoders are still open for development: robustness is an issue
(4) Models are also open: Kalman, HMMs, ARMA, Particle filters are all useful BUT complete understanding and assessment of how model parameters are functioning is critical for design issues