Vision-Based Retrieval of Dynamic Hand Gestures

Computer Science

Vision-Based Retrieval of Dynamic Hand Gestures

Thesis Proposal by

Jonathan Alon

Thesis Committee:

Stan Sclaroff, Margrit Betke, George Kollios,

and Trevor Darrell

Computer Science

Example Application

Computer Science

Isolated Gesture Recognition

Q

M1

M2

M3

M4

A query gesture Q Database of gesture

examples Mg, and their class labels Cg, 1gN.

Problem: Predict the class label CQ bothaccurately and efficiently

C1=‘CAR’

CQ = ?

C2=‘BUY’

C3=‘CAR’

C4=‘BUY’

Computer Science

Research Goals

Problem: Predict the class label CQ accurately and efficiently:

1. Accurately: design a distance measure D such thatsimilarity in input space using D=>similarity in class space

2. Efficiently: better than brute force, computingD(Q,Mg), for all g:1gN.

QCQ=C3=‘CAR’A small D ( ,

M3) =>

QCQC4=‘BUY’A large D ( ,

M4) =>

Computer Science

Example Hand Gesture Data

“Video Gestures”American Sign Language

Computer Science

Related (ASL Recognition)Work

Hand segmentation: Previous: higher level recognition models assume perfect

segmentation, and methods are either too simple [Starner&Pentland 95, Vogler&Metaxas99, Yang&Ahuja

02] or too complicated [Cui&Weng 95, Ong&Bowden 04]

Proposed: more sophisticated distance measure will enable simple hand segmentation, and

more general background, textured clothes, and hand occlusions.

Vocabulary size Previous (vision-based): tens. Proposed: hundreds.

Data Previous: usually the researcher is the signer [Starner&Pentland

95, Cui&Weng 95]. Proposed: native signers. Fast gesture speeds. More realistic

gesture variations.

Computer Science

Proposed methods (1)

1. Accurately: propose a Dynamic Space Time Warping (DSTW) algorithm that can accommodate multiple hypotheses about the hand location in every frame of the query gesture sequence.

DSTW will enable a simple and efficient multiple candidate hand detection algorithm.

Computer Science

Proposed methods (2)

2. Efficiently: use a filtering method, which consists of two steps:

1. Filter step: compute D’(Q,Mg), for all g:1gN based on a fast but approximate distance D’. Retain P most promising gesture examples.

2. Refine step: compute D(Q,Mh), for h:1hP based on the slow but exact distance D. Predict CQ based on class labels of Nearest Neighbors (NN).

Computer Science

Outline

Introduction Motivation Research Goals Related Work Proposed Methods

System Overview Multiple Candidate Hand Detection Feature Extraction and Processing Dynamic Space-Time Warping (DSTW) Approximate Matching via Prototypes

Feasibility Study Thesis Roadmap Conclusion

Computer Science

Isolated Gesture RecognitionSystem Diagram

query gesture sequence

multiple candidatehand detection

multiple candidatehand subimages

feature extractionand processing

database features Mg

Filter: approximatematching using D’

candidate matches

video database ofisolated gestures

Refine: exact matching using

D

best matches

browsing

retrieval results

query features Q

Computer Science

Contributions







candidate matches



D

best matches

browsing

retrieval results

query features Q

Computer Science

System Diagram







candidate matches



D

best matches

browsing

retrieval results

query features Q

Computer Science

Multiple CandidateHand Detection (1)

Key observation: the gesturing hand cannot be reliably and unambiguously detected, regardless of the visual features used for detection.

However, the gesturing hand is consistently among the top K candidates identified by e.g., skin detection (K=15 in this example).

Candidate Hand RegionsInput Frame

Computer Science

Multiple CandidateHand Detection (2)

Input Sequence

Computer Science

Isolated Gesture RecognitionSystem Diagram







candidate matches



D

best matches

browsing

retrieval results

query features Q

Computer Science

Feature Extraction (1)

Multi-dimensionaltime series

Input Gesture Sequence

i

i

i

i

i

v

u

y

x

M

i

mi MMMMM ,,,,, 21

Computer Science


Feature requirements: Low resolution hand image => coarse shape

features. Hand localization is not accurate => use histograms.

Features: Position: hand centroid. Velocity: optical flow. Motion: optical flow direction histograms [Ardizzone

and LaCascia 97] Texture: edge orientation histograms

[Roth&Freeman 95] Shape: parameters of ellipse fit to hand [Starner 95] Color: used for detection; not useful for recognition.

Computer Science

System Diagram







candidate matches



D

best matches

browsing

retrieval results

query features Q

Computer Science

Dynamic Time Warping (DTW) Recognition

Given a query sequence Q and a database sequence M, DTW computes the optimal alignment (or warping path) W and matching cost D.

However, DTW assumes that a single feature vector (e.g., 2D position of the hand) can be reliably extracted from each query frame.

Q

M

..

..

.. ..

W

D

Frame 1

Frame 32

Frame 51

Frame 1 Frame 50 Frame 80

DG(Mi,Qj)

Computer Science

DTW Math (1): Distance between feature vectors

Mi, Qj are F-dimensional vectors. The distance measure between two feature

vectors can be the Euclidean distance:

DG can be more general. For example, (weighted) Lp norm.

2

1

1

2),(

F

f

fj

fijiG QMQMD

Computer Science

DTW Math (2): Distance between (sub)sequences

Initialization

Iteration

Termination

0)0,0( cumD

njjDcum ,,1,),0(

miiDcum ,,1,)0,(

)1,(),,1(),1,1(min),(),( jiDjiDjiDQMDjiD cumcumcumjiGcum

),(),( nmDQMD cumDTW

njmi ,,1,,,1

Computer Science

Dynamic Space-Time Warping (DSTW) Recognition

DSTW can accommodate multiple candidate feature vectors at every time step.

DSTW simultaneously localizes the gesturing hand in every frame of the query sequence and recognizes the gesture.

Q

M

..

..

.. ..

W

12

K

WW

D

Computer Science

DSTW Math

Initialization

Iteration

Termination

KkkDcum ,,1,0),0,0(

KknjkjDcum ,,1,,,1,),,0(

KkmikiDcum ,,1,,,1,),0,(

),(min),(')( 1)(1

ttwNw

jkiGtcum wwCQMDwDtt

),,(min),( knmDQMD cumk

DSTW

Kknjmi ,1,,,1,,,1

),,(),(),(),( 111 kjiwwDwwwwC ttcumtttt

KjijijikjiN ,11,1,1,,,1),,(

Computer Science

Translation-Invariance (1)

2.1. The user may gesture in any part of the image.

Solution: Run K separate DSTW processes Pk in parallel

Pk subtracts the position of the kth candidate in the first frame from all candidates in subsequent frames.

Select Pk with the best matching score.

Computer Science


2.2. False matches occur frequently when only position feature is used.For example, notice how spurious detections on the face in the query sequence falsely match model digit 1.

Solution: include velocity in the feature vector.

Model digit 1Query digit 1

Frame 1

Frame 24

Frame 36

Computer Science


2.1. The user may gesture in any part of the image.

Solution: Use centroid of face detector’s bounding box.

Computer Science

Scale-Invariance

1. Use an image pyramid.2. Compare size of face bounding box.

(Face detector internally uses image pyramid).

Computer Science

Complexity

F – number of features L – average sequence length K – number of hand candidates

------------------------------------------------------------------DTW: O(F·L2)DSTW: O(K·F·L2)DSTW with translation invariance: O(K2·F·L2)

Computer Science

System Diagram







candidate matches



D

best matches

browsing

retrieval results

query features Q

Computer Science

Approximate Distance D’Motivation

Lipschitz embeddings and BoostMap are embedding methods that represent each object by a vector of distances from the object to a set of d prototypes.

Can efficiently compute distances between objects in the embedded space (requiring only O(d) operations).

Can apply the same idea to time series, however

The distance representation loses all information about the alignment.

Computer Science

Approximate Distance D’:Alignment via PrototypesM R1

1

76

5

4

3

32

1

2/)(

2/)(

1

FL

MM

M

M

M

MM

M

ME R

12

3

4

5

6

1

2

3

4

5

67

LdF

FL

R

FL

R

FL

RRR

d

d

MEMEMEME

)(,,)(,)(

1

2

2

1

11 ,,

Computer Science

Approximate Distance D’:Alignment via PrototypesM QR

6

5

4

3

2

1

76

5

4

3

32

1

'

'

'

'

'

'

2/)(

2/)(

M

M

M

M

M

M

MM

M

M

M

MM

M

ME R

6

5

4

3

2

1

54

3

3

2

2

1

'

'

'

'

'

'

2/)( Q

Q

Q

Q

Q

Q

QQ

Q

Q

Q

Q

Q

QE R

12

3

4

5

6

1

2

3

4

5

67

1

2

3

45

ll

L

lG QMDQMD ',','

1

Computer Science

Approximate Distance D’:Alignment via PrototypesM QR

12

3

4

5

6

1

2

3

4

5

67

1

2

3

45

M Q1

2

3

4

5

67

1

2

3

45

QMDQMD ,,'

Computer Science

Justifying the Approximation

Why does it work? Two properties:1. If the query and prototype are identical,

then the approximate distance and the exact distance are identical.

2. If the query and database object are identical, then the approximate distance is 0, and the database object will be retrieved as Nearest Neighbor.

3. More information…

Computer Science


Why does it work? Two properties:1. If the query and prototype are identical,

then the approximate distance and the exact distance are identical.

2. If the query and database object are identical, then the approximate distance is 0, and the database object will be retrieved as Nearest Neighbor.

3. More information…

0,',' MEMEDQEMED RRRR

QMDQMEDQEMEDQEMED QQQRR ,,',','

Computer Science

Prototype Selection

Approach: Sequential Forward Search(SFS):

1. Select the first prototype R1 that minimizes classification error.

2. For i=2 to d doSelect the next prototype Ri that together with the set of prototypes selected so far {R1,…,Ri-1} gives the lowest classification error.

Computer Science

Prototype Selection

Approach: Sequential Forward Search(SFS):1. Select the first prototype R1 that minimizes

classification error.2. For i=2 to d do

Select the next prototype Ri that together with the set of prototypes selected so far {R1,…,Ri-1} gives the lowest classification error.

Can do Sequential Backward Search(SBS) by removing worse prototype at every step.

Can give weights to individual prototypes or individual features.

Computer Science

Filter and Refine

Offline:0. Select prototypes Ri.

1. Embed all database gestures E(Mg).

Online:1. Embed query E(Q).2. Filter: compute approximate distance D’(Q,Mg)

between query and all database gestures in the embedded space.

3. Retain P NN as candidate matches.4. Refine: rerank P candidates based on the exact

distance D.

Computer Science

Complexity

F=3: number of features L=50: average sequence length N=10,000: number of database sequences d=10: number of prototypes P=10: number of retrieved database sequences

---------------------------------------------------------Brute force = O( N·F·L2 ) Compute N exact DDTW distances

---------------------------------------------------------Filter step = O( d·F·L2 + N·d·F·L )Compute d exact WDTW alignments + Compute N approximate DDTW’ distances

Refine step = O( P·F·L2 )Compute P exact DDTW distances

--------------------------------------------------------- N > (d + N·d/L + P)

Computer Science

Reducing Complexity

Filter step=O(d·F·L2+N·d·F·L)Second term is expensive.Well known NN shortcoming.Proposed solutions:1. Feature selection: reduce

the number of features, d·F·L.

2. Condensing: reduce the number of objects, N.

Computer Science

Feasibility Study

1. Exact distance DDSTW Application: recognition of “video digits”. Compare DTW vs. DSTW accuracy. Verify that translation-invariance works. What is the right K? Use cross-validation.

2. Approximate distance D’DTW Application: recognition of UNIPEN digits. Measure accuracy vs. time tradeoff of

approximate DTW vs. BoostMap and CSDTW.

Recognition of NIST digits, using approximate shape context distance.

Computer Science

Video Digit Recognition Experiment

3 users, 10 digits, 3 examples per digit. DSTW without translation invariance Features: Position and velocity (x,y,u,v) Performance measure: classification accuracy (%)

11.1%-21.1% increase in classification accuracy.

Computer Science

UNIPEN Digit Recognition Experiment

15,953 digit samples. Features: Position and angle (x,y,theta) Performance measure: classification error (%) vs. number of

exact distance computations.

Using query and all database gives 1.90% error using 10,630 DDTW.

CSDTW gives 2.90% using 150 DDTW.

Given a test error of 2.80% the method is about twice faster than BoostMap and about ten times faster than CSDTW.

Computer Science

Conclusions DSTW

Pros: Hand detection is not merely a bottom-up procedure. Recognition can be achieved even in the presence of

multiple “distractors”, and overlaps between the gesturing hand and the face or the other hand.

Recognition is translation-invariant. For real-time performance, hand detection can afford to use

more efficient features with higher false positive rates, and rely on DSTW’s capability to handle multiple candidates to reject many false detections.

DSTW provides a general method for matching time series, that can accommodate multiple candidate feature vectors at each time step.

Cons: Space and time complexity increase by a factor of K for

translation-dependent recognition, and by a factor of K2 for translation-invariant recognition.

Computer Science

Conclusions Approximate Alignment via Prototypes

Pros: Approximate alignment via prototypes is fast. Approximate alignment via prototypes provides a general

method for efficiently approximating distance measures that are based on expensive alignment methods (e.g., shape context distance).

The number of points in the two objects does not have to be equal.

The more expensive the exact alignment method the greater the benefit from approximation.

Cons: Cannot guarantee false dismissals in filter step. Every point in one object has to be matched with at least

one point from the other object. That excludes approximating Longest Common Subsequence (LCS)

similarity measure.

Computer Science

Gesture Spotting

Computer Science

Isolated Gesture Recognition vs. Gesture Spotting

Q

M1

M2

M3

M4

Q

M

Whole Matching Subsequence Matchingvs.

Computer Science

Gesture Spotting:Research Agenda

Indirect temporal segmentation (segmentation by recognition): implement brute-force search using sliding window. Now, we do not know hand locations in database

sequence M. Extend DSTW to include a 4th spatial axis. Alternatively, Assume cooperative user who marks hand locations in query.

Direct temporal segmentation: are there hand motion features that can predict gesture boundaries?

How to combine gesture boundaries estimates from direct and indirect approaches?

Computer Science

Thesis Roadmap

Data Collection and annotation: Isolated gesture recognition. Gesture spotting.

Algorithms: Hand features. Approximate DSTW, or alternative indexing

method(s). Temporal segmentation.

Implement demos.

Computer Science

Thank You!

Computer Science

Example Model Digits

Computer Science

Example Correct Match

Computer Science

Digit Recognition Experiment

3 users. Database models:

3 examples per digit per user User wears a colored glove. Color detection finds a single correct hand region.

Queries: 3 examples per digit per user. User wears a shirt with long sleeves in one

experiment, and short sleeves in another. Skin detection generates 15 candidate hand regions.

Features: 2D position (x,y) and 2D velocity (u,v)

Example Model Digits

Computer Science

Results

For translation invariant recognition, the inclusion of velocity in the feature vector is essential for recognition, and improves classification rates by 20% and 10% for user-dep. and user-indep. recognition respectively.

User-indep. results are perhaps not satisfactory for real HCI applications, but user-dependent results are, and user-dependent recognition is desirable in many real HCI applications.

Experiment(LS: Long Sleeves, SS: Short Sleeves. TD: Translation Dependent, TI: Translation Invariant. P: Position, PV: Position and Velocity).

User-dep. Classification Accuracy %

User-indep.Classification Accuracy %

LS-TD-P 96.7 85.6

SS-TI-P 73.3 64.4

SS-TI-PV 95.6 74.4

Computer Science

Problem 2: Translation-Invariant Recognition

Goal: maintain recognition rates even when the gesture is globally translated, i.e., signed in any part of the image.

Solution: given the K candidate regions detected in the first frame:

1. Run K separate DSTW processes Pk in parallel Pk assumes that k was the correct candidate in the first

frame, and subtracts the position of the kth candidate in the first frame from all candidates in subsequent frames.

2. Select Pk with the best matching score. Problem: many false matches occur

when only position feature is used.

Computer Science

Recognition Framework cont’d

score. matching optimal - ),(

and path, warpingoptimal - ),(

, and subseqs.between dist. cumulative - ),(

feature, velocity andposition 2Dquery - ) v,u ,y ,(x

feature, velocity andposition 2D model - ) v,u ,y ,(x

, and featuresbetween distanceEuclidean - ),(

where)},,1(),1,1(),1,(min{

),(),(

*

1*

j:1:1

jjjj

iiii

j

nmDD

wwW

QMjiD

Q

M

QMjid

jiDjiDjiD

jidjiD

L

i

j

i

i

k candidate and j frame of

featurequery - ),,,(

cost.n transitio-

and ,),,( of neighbors - )(

where)},,'()'({min

)()(

)('

jkjkjkjkjk

wNw

vuyxQ

kjiwwN

wwwD

wdwD

Computer Science

Problem 2: Translation-Invariant Recognition

2.2. False matches occur frequently when only position feature is used.For example, notice how the elbow in query digit 3 is falsely matched with the bottom part of the digit 7.

Solution: include velocity in the feature vector.

Model digit 7Query digit 3

Frame 1

Frame 45

Frame 85

Computer Science

Multi-dimensional time series examples

“Video Gestures”American Sign Language

Cursive Handwriting

Computer Science

Conclusions & Future Work

Conclusions+DSTW is a general framework for matching time

series, that can accommodate multiple (K) candidate feature vectors at each time step.

+Translation-invariance is incorporated in the framework.

- Space and time complexity increase by a factor of K for translation-dependent recognition, and K2 for translation-invariant recognition.

Future Work Dynamic feature selection. Gesture verification. Temporal segmentation.

Computer Science

Problem Statement (2)

Gesture Spotting Problem: Given a long image sequence of gestures M (the

database), a gesture query sequence Q, a distance measure D, and a distance tolerance ε, find those data subsequences x ⊆ M which satisfy D(x,Q) ≤ ε.

M can be an ASL story Q can be:

An ASL sign (e.g., “CAR”) Finger spelling (e.g., “John”) Any hand motion between signs (motion epenthesis)

D will be Dynamic Time Warping (DTW) distance or a variant of it.

Computer Science


M3) =>


M4) =>

Computer Science


Visual ASL Dictionary Problem: Given a database (dictionary) of gesture image

sequence Mi, a sign query sequence Q, a distance measure D, and a distance tolerance ε, find those data exemplars Mj which satisfy D(Mi,Q) ≤ ε.

Computer Science



sequence Mi, a sign query sequence Q, a distance measure D, and a distance tolerance ε, find those data exemplars Mj which satisfy D(Mj,Q) ≤ ε.

Q is a sign performed by a novice ASL student in front of a camera.

Mi are examples of isolated signs.

Computer Science




Application Assumptions: In producing Q, the ASL student may be

cooperative. Examples Mi can be collected with any constraints

that would improve the task performance. For example:

Colored gloves Slow gestures

Computer Science




Search Alternatives: Search for neighbors in ε-ball. Search for k Nearest Neighbors (kNN). Rank the entire database.

Computer Science




Real Goal: Assume:

Class labels are known, e.g., C(Mi) = “CAR” Mj are sorted in ascending order based on D

We want: Sign(M1) = Sign(Q), or Sign(Mj) = Sign(Q) for as many examples Mj with small enough j.

We want: Similarity in input (feature) space => Similarity in class space.

Computer Science

Outline

Introduction Assumptions Challenges Formal Problem Statement

ASL Dictionary Problem Gesture Spotting

System Overview Multiple Candidate Hand Detection Feature Extraction and Processing Dynamic Space-Time Warping (DSTW) Approximate matching via Prototypes Temporal Segmentation

Feasibility Study Related Work Schedule Conclusion

Computer Science

Research Goals

Problem: Predict the class label CQ accurately and efficiently:

1. Accurately: design a distance measure D such thatsimilarity in input space using D=>similarity in class space

2. Efficiently: better than brute force, computing the exact distance between query gesture and all database gesture examples. ( D(Q,Mi), for all i ).


M3) =>


M4) =>

Computer Science

Proposed methods

1. Accurately: propose a Dynamic Space Time Warping (DSTW) algorithm that can accommodate multiple hypotheses about the hand location in every .

DSTW will enable a simple and efficient multiple candidate hand detection algorithm.

2. Efficiently: use a filtering method, which consists of two steps:

1. Filter step: compute D’(Q,Mi), for all i:1iN based on a fast but approximate distance D’. Retain P most promising gesture examples.

2. Refine step: compute D(Q,Mj), for j:1jP based on the slow but exact distance D. Predict CQ based on class labels of Nearest Neighbors (NN).

Computer Science


Show image with (x,y,u,v) Or image with (x,y,theta)

Computer Science

Assumptions

Sensor Single color camera

Background Not necessarily uniform

Viewing Condition Frontal upper body view

Foreground Single gesturer Objects of interest: hands

Static camera and static gesturer Lighting

Constant or slowly varying

Computer Science

Challenges

Geometric variation Translation and scale

User (signer) independence Body kinematics: shape and size of different body

parts Style: speed, emphasis

Different gesture durations Textured clothes Native signers and high gesture speeds Hand occlusion and self-occlusion Difficult sign types

Repetitions, agentive forms, location and context dependent signs

Computer Science

Contributions





database features

approximatematching (filter)

candidate matches


exact matching (refine)

best matches

browsing

retrieval results

query features

Computer Science

ASL DictionarySystem Diagram





database features


candidate matches



best matches

browsing

retrieval results

query features

Computer Science

System Diagram





database features


candidate matches



best matches

browsing

retrieval results

query features

Computer Science






database features


candidate matches



best matches

browsing

retrieval results

query features

Computer Science






database features


candidate matches



best matches

browsing

retrieval results

query features

Computer Science

DTW Math

Mi, Qj are F-dimensional vectors. Wf are weights. The distance measure between two feature vectors

is a weighted Lp norm.

For example, if Wf = 1 and p=2 we get the Euclidean distance.

pF

f

pfj

fi

fjiG QMwQMD

1

1

),(

Computer Science


We want a contractive embedding:

Why? because we can filter unlikely matches, and guarantee no false dismissals.

But, we could not prove it.

QMQMDQMD DTW ,),,(),('

Computer Science

Prototype Selection

Approach: Sequential Forward Search Select the first prototype R1 that minimizes

classification error. For i=2 to d do

Select the next prototype Ri that together with the set of prototypes selected so far {R1,…,Ri-1} gives the lowest classification error.

Can do backward search too by removing worse prototype at every step.

Can give weights to individual prototypes or individual features.

Computer Science

Complexity

F – number of features L – average sequence length K – number of hand candidates N – number of database objects d – number of prototypes O(dFL2 + NdFL) = O(dFL(L+N)) Example: in UNIPEN digit dataset, L = 50,

N = 10,000 The dominance of the second term is due to

NN shortcoming. Approach: Feature selection to reduce number of features, dFL. Condensing to reduce number of objects, N.

Computer Science


1D time series

Multi-dimensionaltime series

Input Gesture Sequence

i

i

i

i

i

v

u

y

x

M

i

mi MMMMM ,,,,, 21

Computer Science

Approximate Distance D’:Alignment via Prototype

M R

2/)(

2/)(

76

5

4

3

32

1

MM

M

M

M

MM

M

MF

Computer Science

Approximate Distance D’:Alignment via Prototype

Computer Science

Alignment via Prototype

)),(),,(()','()( 432121 MMfMMfMMMF

))(),,(()','()( 32121 QfQQfQQQF

2

1

)','())(),(('j

jjG QMDQFMFD

For d prototypes concatenate the d vectors

Computer Science

Conclusions

Pros: Hand detection is not merely a bottom-up procedure. The gesture

model is used to select hand locations in a way that the query-to-model matching cost is optimized.

Recognition can be achieved even in the presence of multiple “distractors”, like moving objects, or skin-colored objects (e.g., face, non-gesturing hand, background objects).

Recognition is robust to overlaps between the gesturing hand and the face or the other hand.

Recognition is translation-invariant; the gesture can occur in any part of the image.

For real-time performance, hand detection can afford to use more efficient features with higher false positive rates, and rely on DSTW’s capability to handle multiple candidates to reject many false detections.

DSTW provides a general method for matching time series, that can accommodate multiple candidate feature vectors at each time step.

Cons:Space and time complexity increase by a factor of K for translation-

dependent recognition, and by a factor of K2 for translation-invariant recognition.

Computer Science

Filter:Approximate Distance D’:

Offline:0. Select prototypes Ri.1. Compute correspondence between database

sequences Mg and prototypes Ri using exact alignment W(Mg,Ri).

2. Use alignments to embed database sequences F(Mg). Online:

1. Compute correspondence between query sequence Q and prototypes Ri using exact alignment W(Q,Ri).

2. Use alignment to embed query sequences F(Q).This induces an approximate alignment WR(Q,Mg) between query and any database sequence.

3. Use approximate alignment WR(Q,Mg) to compute approximate distance D’(Q,Mg) in the embedded space.

Computer Science

Assumptions

Sensor Single color camera

Background Not necessarily uniform

Viewing Condition Frontal upper body view

Foreground Single gesturer Objects of interest: hands

Static camera and static gesturer Lighting

Constant or slowly varying

Computer Science

Challenges

Geometric variation Translation and scale

User (signer) independence Body kinematics: shape and size of different body

parts Style: speed, emphasis

Different gesture durations Textured clothes Native signers and high gesture speeds Hand occlusion and self-occlusion Difficult sign types

Repetitions, agentive forms, location and context dependent signs

Documents

Vision-Based Retrieval of Dynamic Hand Gestures