Gesture recognition using salience detection and concatenated HMMs

A Hierarchical Approach to Continuous Gesture Analysis for Natural Multi-modal interaction

Gesture recognition using salience detection and concatenated HMMsYing [email protected]

Randall [email protected]

Massachusetts Institute of Technology

Our approach to gesture recognition using1System overview

Depth & RGB imagesGesture spotting & recognitionHand trackingHand movement segmentationXsensdataFeature vector sequenceFeature vector sequence withmovementIf its communicative gesture, when the gesture ends, it will apply the recognized gesture to the application.2System overview

Depth & RGB imagesGesture spotting & recognitionHand trackingHand movement segmentationXsensdataFeature vector sequenceFeature vector sequence withmovementIf its communicative gesture, when the gesture ends, it will apply the recognized gesture to the application.3Hand trackingKinect skeleton tracking is less accurate when the hands are close to the body or move fast

We use both RGB and depth informationSkin Gesture salience (motion and closeness to the observer)

From the depth information, we compute gesture salience probabilityWhich is based on the assumption that a salient gesture usually involves large motion and is performed close to the observerThe red region is our hand tracking result4

Hand tracking5

Hand tracking6Input to recognizerFeature vector xtFrom the Kinect data and hand trackingRelative position of the gesturing hand with respect to shoulder center in world coordinate (R3)From the Xsens unit on the handLinear acceleration (R3)Angular velocity (R3)Euler orientation (yaw, pitch, roll) (R3)System overview

Depth & RGB imagesGesture spotting & recognitionHand trackingHand movement segmentationXsensdataFeature vector sequenceFeature vector sequence withmovementIf its communicative gesture, when the gesture ends, it will apply the recognized gesture to the application.8Hand movement segmentationPart of gesture spottingTrain Gaussian models for rest and non-rest positionsDuring recognition, an observation xt is first classified as a rest or a non-rest positionIt is a non-rest position if

System overview

Depth & RGB imagesGesture spotting & recognitionHand trackingHand movement segmentationXsensdataFeature vector sequenceFeature vector sequence withmovementTrain models for the gestures10Temporal model of gestures

We model gestures based on previous work on gesture analysis11Continuous gesture modelsPre-strokeNucleusPost-strokeRestEndContinuous gesture modelsPre-strokeNucleusPost-strokeRestEndContinuous gesture modelsPre-strokeNucleusPost-strokeRestEndPre-stroke & post-stroke phases

Bakis model for nucleus phasestarts1s2s3s6p(s1)p(END|s6)6 hidden states per nucleus phase in the final modelEmission probability: mixture of Gaussians with 6 mixturess4s5Concatenated HMMsTrain an HMM for each phase for each gestureModel termination probability for each hidden state s as p(END|s)

EM parameter estimation

Concatenated HMMsAfter training, concatenate HMMs for each phase to form one HMM for each gestureCompute transition probability from the previous phase to the next phase

Ensure

Detect rest vs non-rest segments

Find concatenated HMM that gives the highest probabilityFind most probable hidden state sequence using ViterbiAssign hidden states to corresponding phasesIdentify segment without nucleus phase

Gesture spotting & recognitionno nucleus phaseVisualization of hidden states19Gesture recognition result visualization

Visualization of recognition result on a unsegmented input sequence with 5 gestures.The x-axis shows the time frame indices.20Hand position onlyXsens onlyHand position & XsensF1 score0.677 (0.04)0.890 (0.02)0.907 (0.01)ATSR score0.893 (0.02)0.920 (0.01)0.923 (0.02)Final score0.710 (0.03)0.895 (0.01)0.912 (0.01)Gesture recognition result10 users and 10 gestures and 3 rest positions3-fold average

Result based on features including the inertia measurement unit data21Gesture recognition resultUser independent training and testing3-fold average

Latent Dynamic CRFConcatenated HMMsF1 score0.820 (0.03)0.897 (0.03)ATSR score0.923 (0.02)0.907 (0.02)Final score0.828 (0.02)0.898 (0.02)Training time18hr7minIn each fold, the training data contains data from two thirds of users22ContributionsEmployed novel gesture phase differentiation using concatenated HMMs

Used hidden states toidentify movements with no nucleus phasesaccurately detect start and end of nucleus phases

Improved hand tracking when the hand is close to the body or moving fast by gesture salience detection23

Documents

Gesture recognition using salience detection and concatenated HMMs