25
Nonlinear Manifold Learning for Visual Speech Recognition Christoph Bregler and Stephen Omohundro University of California, Berkeley & NEC Research Institute, Inc. 1/25

Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Nonlinear Manifold Learning for Visual Speech Recognition

Christoph Bregler and Stephen Omohundro

University of California, Berkeley & NEC Research Institute, Inc.

1/25

Page 2: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Overview Manifold Learning:

Applications to Lip Track­ing, Image Interpolation, and Feature Extraction.

Experiments with a full Speech Recognition System:

A technique for representing and learning smooth nonlin­ear manifolds is presented and applied to several lip read­ing tasks. Given a set of points drawn from a smooth manifold in an abstract feature space, the technique is capable of determining the structure of the surface and of finding the closest manifold point to a given query point.

We use this technique to learn the "space of lips" in a visual speech recognition task. The learned manifold is used for tracking and extracting the lips, for interpolating between frames in an image sequence and for providing features for recognition.

We describe a system based on Hidden Markov Models and this learned lip manifold that significantly improves the performance of acoustic speech recognizers in degraded environments. We also present preliminary results on a purely visual lip reader.

2125

Page 3: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

----

Problem

1. Boundary Tracking

-Acoustic Features 2. Image Interpolation

Vt V2 V3 V4 V5 V6 V73. Lip Features At A2A3 A4 A5 A6A7

4. Speech Recognition [ HMM ]

3/25

Page 4: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

--

Full Reeo gnition System

"Visual Speech"

"Contour-Surfing" MLPIHMM"Eigenlips"

HybridLip-Interpolation Speech-Recognition

System Acoustic Speech

'pi ",1 RASTA-PLP

I , ~' r

- ~ -- /

@~ Full Sentence Car-Noise Cross-Talk

4/25

Page 5: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Lip Tracking using Active Contour Models

"Snake"­Tracking

Snake Model = "Controlled Continuity Spline"

- 5/25

Page 6: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Space ofContours as Constrained Manifold

N Dimensional Features are Points in N Dimensional Space.

N Dimensional Feature Space

K Dimensional Subspace

6/25

Page 7: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

••• • • • • • • • • • • • • • • • • • •

Abstract Manifold Learning Task

/'/_.---..• / '"

\( " J \

\ II

• . ... --- ) I I \

\ i I

) \

t

, .. ""---..,._/

Mixture of LearnedSamples local adaptive Patches Representation

7/25

Page 8: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Mixture ofPatches Architecture

~.L.L.L.Luence ?'~,

ction

PI

Linear Patch

I,Gi (x) · Pi (x)= ___.i ____P(x) I,Gi (x)

i

8/25

Page 9: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Manifold Learning

• Initial Estimate

- Cluster Feature Space (K-Means)

- LocalPCA

• Minimize MSE between Training Set and Projection

- EM Algorithm

- Gradient Descent

• First Best Model Merging

9125

Page 10: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Synthetic Examples

Learned surface

Training Salllples 10/25

Page 11: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

•• •• •••

Comparison to Nearest Neighbor

MSE 0.3

.,,1...Surface Priornearest -

point 0.2

query

I

• ,,

0.1­

"

0.0 ;:

50

11125

Page 12: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Manifold Dimension ?

Eigen Value 180.00

160.00

140.00

120.00

100.00

80.00

60.00

40.00

20.00

o.oo...a...:

0.00

1st Eigenvalue

... 2nd Eigenvalue ,

, " ,

, , "

"

, J

,,

,,,

" "

,,,.

, " ... 3rd Eigenvalue

, ."......."..../ ...........

...................................." ....""",...."....".'

50.00 100.00 150.00 200.00 Samples per peA

12/25

Page 13: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

~ ...... -­Directed Graylevel Gradients along Snake

Manifold-based Snake Tracking

LIP-Space

Learned Manifold

Jl -­.- ­

Manifold Snake Energy =Image Term+ Learned Manifold Term

Use Manifold-Snakes to estimate position, scale, and rotation for gray level matrix extraction.

13125

Page 14: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Graylevel Manifolds

Linear

Linear Lip-Space

=>

Nonlinear

Real Lip-Space

Dimension Reduction Dimension Reduction

14/25

Page 15: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Linear vs. Nonlinear Interpolation

, / ....

Graylevel

Dimensions

....... Leam nonlinear subspace of "legal" images

....... constrain interpolated images to subspace

(16x16 pixel = 256 dim. space)

15/25

Page 16: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Nonlinear Interpolation Technique

___.-~--i~~> /

Y :,...

"Manifold-Snake" "-

'\,

~

2E = L IVi- 1 - 2Vi + V i + 11+ distance (Vi) i

- Begin with sequence of linear interpolated points - Iteratively move points toward manifold - Iterations are equal to gradient descent steps using the Manifold-Snake Energy

.,,"'- .,., ", ...... - ........ ; ..".-...... #""•• - ••", ,.., -"\ , '. , I '.l

j • I i i j 'j,, ....... .., "I

~~~...~~ ~..... • "'f \ I \. ......_,,/ ..._" .. ... _wl' .. ~.~~" -"'11_.....,,,,I

alter. 1 Iter. 2 Iter. 5 Iter. lOIter.

16/25

Page 17: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction
Page 18: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Natural Lip Images Linear Interpolation Nonlinear Interpolation

24x24 Images projected in a 32-D linear space and 8-D nonlinear manifold

18/25

Page 19: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction
Page 20: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Phone Probability Estimation -2­

P(phone I acoustics, visual-input)

00000 63 Softmax Prob. Outputs Multi-Layer Perceptron

256-512 Hidden Units

t 19 frames

~~---""".,... ~ ~

9 features 10 Eigen-Features (1 Energy, 8 RASTA-PLP) (+ 10 Delta-Eigen-Features)

~ -

20125

Page 21: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Hidden Markov Models

Viterbi­Approximation

ele e e

•e

010 0 0

e01.

• •e

0 0

•0

•0

0 e•0 • 0

Mi (Word-Model)

Likelihood: P(A, VIMi)

Ol~ III

•0 P (phone lA, V)

III P (A, Vlphone) oc P (phone)

e~ Ol~

... Time

21125

Page 22: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Reeo gnition Results

Task

clean

20db SNR

lOdbSNR

15dbSNR crosstalk

Acoustic

11.0 %

33.5 %

56.1 %

67.3 %

Eigenlips

10.1 %

28.9%

51.7 %

51.7%

Delta-Lips Rel.Err.Red.

11.3 % -26.0% 22.4 %

48.0% 14.4 %

46.0% 31.6 %

Table 1: Spelling Task (6 Speakers) Word Error (wrong + insertion + deletion)

Task Pure Lipreading-Error

Bar-Tender 4.5%

Table 2: Pure Lipreading (1 Speaker, "Bar-Thsk": 4 Cocktail-Names)

22/25

Page 23: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Related Work

Related Linear Representations:

Kirby et aI., Pentland et aI.: Linear Subspaces Simard: Tangent Distance

Related Nonlinear Representation:

Jacobs, Jordan, Nowlan, Hinton: Mixture of Experts Kambhatla, Leen: Local peA

Related Interpolation Techniques:

Poggio et aI.: RBFs for Image Interpolation

23/25

Page 24: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Computer Lipreading History

- U.S.Patent 3192321 (1965) E. Nassimbene (IBM)

- E. Petajan (1984), Dissertation

- B. Yuhas, M. Goldstein, T. Sejnowski (1989)

- K. Mase, A. Pentland (1991)

- O. Garcia, A. Goldschen, E. Petajan (1992)

- D. Stork, G. Wolff, K. V. Prasad, M. Hennecke (1992)

- P.L. Silsbee (1993), Dissertation

- A.f. Goldschen (1993), Dissertation

- f. Movellan (1994)

24125

Page 25: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction

Summary

Lipreading:

- Improves Speech Recognition Performance significantly - Full System Solution

Manifold Learning:

- New fundamental technique for learning in geometric domains

- Applicable to variety of different tasks

25125