23
Gesture recognition using salience detection and concatenated HMMs Ying Yin [email protected] Randall Davis [email protected] Massachusetts Institute of Technology

Gesture recognition using salience detection and concatenated HMMs Ying Yin [email protected] Randall Davis [email protected] Massachusetts Institute

Embed Size (px)

Citation preview

  • Slide 1

Gesture recognition using salience detection and concatenated HMMs Ying Yin [email protected] Randall Davis [email protected] Massachusetts Institute of Technology Slide 2 System overview Depth & RGB images Gesture spotting & recognition Hand tracking Hand movement segmentation Xsens data Feature vector sequence Feature vector sequence with movement Slide 3 System overview Depth & RGB images Gesture spotting & recognition Hand tracking Hand movement segmentation Xsens data Feature vector sequence Feature vector sequence with movement Slide 4 Hand tracking Kinect skeleton tracking is less accurate when the hands are close to the body or move fast We use both RGB and depth information Skin Gesture salience (motion and closeness to the observer) Slide 5 Hand tracking Slide 6 Slide 7 Input to recognizer Feature vector x t From the Kinect data and hand tracking Relative position of the gesturing hand with respect to shoulder center in world coordinate (R 3 ) From the Xsens unit on the hand Linear acceleration (R 3 ) Angular velocity (R 3 ) Euler orientation (yaw, pitch, roll) (R 3 ) Slide 8 System overview Depth & RGB images Gesture spotting & recognition Hand tracking Hand movement segmentation Xsens data Feature vector sequence Feature vector sequence with movement Slide 9 Hand movement segmentation Part of gesture spotting Train Gaussian models for rest and non-rest positions During recognition, an observation x t is first classified as a rest or a non-rest position It is a non-rest position if Slide 10 System overview Depth & RGB images Gesture spotting & recognition Hand tracking Hand movement segmentation Xsens data Feature vector sequence Feature vector sequence with movement Slide 11 Temporal model of gestures Slide 12 Continuous gesture models Pre- stroke Nucleus Post- stroke Rest End Slide 13 Continuous gesture models Pre- stroke Nucleus Post- stroke Rest End Slide 14 Continuous gesture models Pre- stroke Nucleus Post- stroke Rest End Slide 15 Pre-stroke & post-stroke phases Slide 16 Bakis model for nucleus phase start s1s1 s2s2 s3s3 s6s6 p(s1)p(s1) p(END|s 6 ) 6 hidden states per nucleus phase in the final model Emission probability: mixture of Gaussians with 6 mixtures s4s4 s5s5 Slide 17 Concatenated HMMs Train an HMM for each phase for each gesture Model termination probability for each hidden state s as p(END|s) EM parameter estimation Slide 18 Concatenated HMMs After training, concatenate HMMs for each phase to form one HMM for each gesture Compute transition probability from the previous phase to the next phase Ensure Slide 19 Detect rest vs non-rest segments Find concatenated HMM that gives the highest probability Find most probable hidden state sequence using Viterbi Assign hidden states to corresponding phases Identify segment without nucleus phase Gesture spotting & recognition no nucleus phase Slide 20 Gesture recognition result visualization Slide 21 Hand position only Xsens onlyHand position & Xsens F1 score0.677 (0.04)0.890 (0.02)0.907 (0.01) ATSR score0.893 (0.02)0.920 (0.01)0.923 (0.02) Final score0.710 (0.03)0.895 (0.01)0.912 (0.01) Gesture recognition result 10 users and 10 gestures and 3 rest positions 3-fold average Slide 22 Gesture recognition result User independent training and testing 3-fold average Latent Dynamic CRF Concatenated HMMs F1 score0.820 (0.03)0.897 (0.03) ATSR score0.923 (0.02)0.907 (0.02) Final score0.828 (0.02)0.898 (0.02) Training time18hr7min Slide 23 Contributions Employed novel gesture phase differentiation using concatenated HMMs Used hidden states to identify movements with no nucleus phases accurately detect start and end of nucleus phases Improved hand tracking when the hand is close to the body or moving fast by gesture salience detection