60
Reified Context Models Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11

Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Reified Context Models

Jacob Steinhardt Percy Liang

Stanford University

{jsteinhardt,pliang}@cs.stanford.edu

July 8, 2015

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11

Page 2: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Structured Prediction Task

input x :

output y : v o l c a n i c

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 2 / 11

Page 3: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Contexts Are Key

v o l c a

v *o **l ***cDP:

v o l c a

v vo vol volcbeam search:

Key idea: contexts!

*odef=

aoboco...

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 3 / 11

Page 4: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Contexts Are Key

v o l c a

v *o **l ***cDP:

v o l c a

v vo vol volcbeam search:

Key idea: contexts!

*odef=

aoboco...

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 3 / 11

Page 5: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Contexts Are Key

v o l c a

v *o **l ***cDP:

v o l c a

v vo vol volcbeam search:

Key idea: contexts!

*odef=

aoboco...

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 3 / 11

Page 6: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Contexts Are Key

v o l c a

v *o **l ***cDP:

v o l c a

v vo vol volcbeam search:

Key idea: contexts!

*odef=

aoboco...

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 3 / 11

Page 7: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 4 / 11

Page 8: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 4 / 11

Page 9: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 4 / 11

Page 10: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 4 / 11

Page 11: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 4 / 11

Page 12: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 4 / 11

Page 13: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 4 / 11

Page 14: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Reifying Contexts

input x :

output y : v o l c a n i ccontext c: v *o *ol *olc · · · · · ·

Challenge: how to trade off contexts of different lengths?=⇒ Reify contexts as part of model!

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 5 / 11

Page 15: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Reifying Contexts

input x :

output y : v o l c a n i c

context c: v *o *ol *olc · · · · · ·

Challenge: how to trade off contexts of different lengths?=⇒ Reify contexts as part of model!

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 5 / 11

Page 16: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Reifying Contexts

input x :

output y : v o l c a n i ccontext c: v *o *ol *olc · · · · · ·

Challenge: how to trade off contexts of different lengths?=⇒ Reify contexts as part of model!

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 5 / 11

Page 17: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Reifying Contexts

input x :

output y : v o l c a n i ccontext c: v *o *ol *olc · · · · · ·

r ro rol *olcv ra ral ***cy *o *ol ***r

* ** *** ****

Challenge: how to trade off contexts of different lengths?=⇒ Reify contexts as part of model!

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 5 / 11

Page 18: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Reifying Contexts

input x :

output y : v o l c a n i ccontext c: v *o *ol *olc · · · · · ·

r ro rol *olc

←“context sets”v ra ral ***cy *o *ol ***r

* ** *** ****C1 C2 C3 C4

Challenge: how to trade off contexts of different lengths?=⇒ Reify contexts as part of model!

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 5 / 11

Page 19: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Reifying Contexts

input x :

output y : v o l c a n i ccontext c: v *o *ol *olc · · · · · ·

r ro rol *olc

←“context sets”v ra ral ***cy *o *ol ***r

* ** *** ****C1 C2 C3 C4

Challenge: how to trade off contexts of different lengths?

=⇒ Reify contexts as part of model!

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 5 / 11

Page 20: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Reifying Contexts

input x :

output y : v o l c a n i ccontext c: v *o *ol *olc · · · · · ·

r ro rol *olc

←“context sets”v ra ral ***cy *o *ol ***r

* ** *** ****C1 C2 C3 C4

Challenge: how to trade off contexts of different lengths?=⇒ Reify contexts as part of model!

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 5 / 11

Page 21: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Reified Context Models

Given:

context sets C1, . . . ,CL

features φi(ci−1,yi)

Define the model

pθ (y1:L,c1:L−1) ∝ exp

(L

∑i=1

θ>

φi(ci−1,yi)

)· κ(y ,c)︸ ︷︷ ︸

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κφ1 φ2 φ3 φ4 φ5inference via

forward-backward!

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 6 / 11

Page 22: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Reified Context Models

Given:

context sets C1, . . . ,CL

features φi(ci−1,yi)

Define the model

pθ (y1:L,c1:L−1) ∝ exp

(L

∑i=1

θ>

φi(ci−1,yi)

)· κ(y ,c)︸ ︷︷ ︸

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κφ1 φ2 φ3 φ4 φ5inference via

forward-backward!

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 6 / 11

Page 23: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Reified Context Models

Given:

context sets C1, . . . ,CL

features φi(ci−1,yi)

Define the model

pθ (y1:L,c1:L−1) ∝ exp

(L

∑i=1

θ>

φi(ci−1,yi)

)· κ(y ,c)︸ ︷︷ ︸

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κφ1 φ2 φ3 φ4 φ5inference via

forward-backward!

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 6 / 11

Page 24: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Reified Context Models

Given:

context sets C1, . . . ,CL

features φi(ci−1,yi)

Define the model

pθ (y1:L,c1:L−1) ∝ exp

(L

∑i=1

θ>

φi(ci−1,yi)

)· κ(y ,c)︸ ︷︷ ︸

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κ

φ1 φ2 φ3 φ4 φ5inference via

forward-backward!

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 6 / 11

Page 25: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Reified Context Models

Given:

context sets C1, . . . ,CL

features φi(ci−1,yi)

Define the model

pθ (y1:L,c1:L−1) ∝ exp

(L

∑i=1

θ>

φi(ci−1,yi)

)· κ(y ,c)︸ ︷︷ ︸

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κ

φ1 φ2 φ3 φ4 φ5

inference viaforward-backward!

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 6 / 11

Page 26: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Reified Context Models

Given:

context sets C1, . . . ,CL

features φi(ci−1,yi)

Define the model

pθ (y1:L,c1:L−1) ∝ exp

(L

∑i=1

θ>

φi(ci−1,yi)

)· κ(y ,c)︸ ︷︷ ︸

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κ

φ1 φ2 φ3 φ4 φ5inference via

forward-backward!

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 6 / 11

Page 27: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 7 / 11

Page 28: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 7 / 11

Page 29: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 7 / 11

Page 30: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 7 / 11

Page 31: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 7 / 11

Page 32: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 7 / 11

Page 33: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 7 / 11

Page 34: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 7 / 11

Page 35: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 7 / 11

Page 36: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

c

e

ce?

C1

cacb...eaeb...?a...

ca

?a

ca?a??

C2

etc.

Biases towards short contexts unless there is high confidence.

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 7 / 11

Page 37: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Precision

input x :

output y : v o l c a n i c

Model assigns probability to each prediction, so can predict on most confidentsubset.

Measure precision (# of correct words) vs. recall (# of words predicted).comparison: beam search

0.0 0.2 0.4 0.6 0.8 1.0recall

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

prec

isio

n

Word Recognition

Beam searchRCM

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 8 / 11

Page 38: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Precision

input x :

output y : v o l c a n i c

Model assigns probability to each prediction, so can predict on most confidentsubset.

Measure precision (# of correct words) vs. recall (# of words predicted).comparison: beam search

0.0 0.2 0.4 0.6 0.8 1.0recall

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

prec

isio

n

Word Recognition

Beam searchRCM

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 8 / 11

Page 39: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Precision

input x :

output y : v o l c a n i c

Model assigns probability to each prediction, so can predict on most confidentsubset.

Measure precision (# of correct words) vs. recall (# of words predicted).

comparison: beam search

0.0 0.2 0.4 0.6 0.8 1.0recall

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

prec

isio

n

Word Recognition

Beam searchRCM

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 8 / 11

Page 40: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Precision

input x :

output y : v o l c a n i c

Model assigns probability to each prediction, so can predict on most confidentsubset.

Measure precision (# of correct words) vs. recall (# of words predicted).comparison: beam search

0.0 0.2 0.4 0.6 0.8 1.0recall

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

prec

isio

n

Word Recognition

Beam searchRCM

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 8 / 11

Page 41: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Precision

Measure precision (# of correct words) vs. recall (# of words predicted).

0.0 0.2 0.4 0.6 0.8 1.0recall

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00pr

ecis

ion

Word Recognition

Beam searchRCM

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 8 / 11

Page 42: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .

latent z I am what I amoutput y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.use learned emissions to determine cipher.again compare to beam search (Nuhn et al., 2013)

Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

map

ping

acc

urac

y

DeciphermentRCMbeam

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 9 / 11

Page 43: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .latent z I am what I am

output y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.use learned emissions to determine cipher.again compare to beam search (Nuhn et al., 2013)

Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

map

ping

acc

urac

y

DeciphermentRCMbeam

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 9 / 11

Page 44: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .latent z I am what I amoutput y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.use learned emissions to determine cipher.again compare to beam search (Nuhn et al., 2013)

Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

map

ping

acc

urac

y

DeciphermentRCMbeam

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 9 / 11

Page 45: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .latent z I am what I amoutput y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.use learned emissions to determine cipher.again compare to beam search (Nuhn et al., 2013)

Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

map

ping

acc

urac

y

DeciphermentRCMbeam

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 9 / 11

Page 46: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .latent z I am what I amoutput y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.

use learned emissions to determine cipher.again compare to beam search (Nuhn et al., 2013)

Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

map

ping

acc

urac

y

DeciphermentRCMbeam

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 9 / 11

Page 47: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .latent z I am what I amoutput y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.use learned emissions to determine cipher.

again compare to beam search (Nuhn et al., 2013)Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

map

ping

acc

urac

y

DeciphermentRCMbeam

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 9 / 11

Page 48: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .latent z I am what I amoutput y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.use learned emissions to determine cipher.again compare to beam search (Nuhn et al., 2013)

Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

map

ping

acc

urac

y

DeciphermentRCMbeam

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 9 / 11

Page 49: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Partially Supervised Learning

Fraction of correctly mapped words:

0 5 10 15 20training passes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8m

appi

ng a

ccur

acy

DeciphermentRCMbeam

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 9 / 11

Page 50: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Contexts During Training

Context lengths increase smoothly during training:

0 5 10 15 20number of passes

1.5

2.0

2.5

3.0

3.5

4.0

4.5

aver

age

cont

ext l

engt

h

Decipherment

******↓

***ing↓

idding

Start of training: little information, short contexts.End of training: lots of information, long contexts.

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 10 / 11

Page 51: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Contexts During Training

Context lengths increase smoothly during training:

0 5 10 15 20number of passes

1.5

2.0

2.5

3.0

3.5

4.0

4.5

aver

age

cont

ext l

engt

hDecipherment

******↓

***ing↓

idding

Start of training: little information, short contexts.End of training: lots of information, long contexts.

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 10 / 11

Page 52: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Contexts During Training

Context lengths increase smoothly during training:

0 5 10 15 20number of passes

1.5

2.0

2.5

3.0

3.5

4.0

4.5

aver

age

cont

ext l

engt

hDecipherment

******↓

***ing↓

idding

Start of training: little information, short contexts.

End of training: lots of information, long contexts.

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 10 / 11

Page 53: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Contexts During Training

Context lengths increase smoothly during training:

0 5 10 15 20number of passes

1.5

2.0

2.5

3.0

3.5

4.0

4.5

aver

age

cont

ext l

engt

hDecipherment

******↓

***ing↓

idding

Start of training: little information, short contexts.End of training: lots of information, long contexts.

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 10 / 11

Page 54: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Discussion

RCMs provide both expressivity and coverage, which enable:

More accurate uncertainty estimates (precision)

Better partially supervised learning updates

Related work:

Coarse-to-fine inference (Petrov et al., 2006; Weiss et al., 2010)

Certificates of optimality (Sontag, 2010)

Tractable models (Poon & Domingos, 2011; Niepert & Domingos, 2014; Li& Zemel, 2014; S. & Liang, 2015)

Reproducible experiments on Codalab: codalab.org/worksheets

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 11 / 11

Page 55: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Discussion

RCMs provide both expressivity and coverage, which enable:

More accurate uncertainty estimates (precision)

Better partially supervised learning updates

Related work:

Coarse-to-fine inference (Petrov et al., 2006; Weiss et al., 2010)

Certificates of optimality (Sontag, 2010)

Tractable models (Poon & Domingos, 2011; Niepert & Domingos, 2014; Li& Zemel, 2014; S. & Liang, 2015)

Reproducible experiments on Codalab: codalab.org/worksheets

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 11 / 11

Page 56: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Discussion

RCMs provide both expressivity and coverage, which enable:

More accurate uncertainty estimates (precision)

Better partially supervised learning updates

Related work:

Coarse-to-fine inference (Petrov et al., 2006; Weiss et al., 2010)

Certificates of optimality (Sontag, 2010)

Tractable models (Poon & Domingos, 2011; Niepert & Domingos, 2014; Li& Zemel, 2014; S. & Liang, 2015)

Reproducible experiments on Codalab: codalab.org/worksheets

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 11 / 11

Page 57: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Discussion

RCMs provide both expressivity and coverage, which enable:

More accurate uncertainty estimates (precision)

Better partially supervised learning updates

Related work:

Coarse-to-fine inference (Petrov et al., 2006; Weiss et al., 2010)

Certificates of optimality (Sontag, 2010)

Tractable models (Poon & Domingos, 2011; Niepert & Domingos, 2014; Li& Zemel, 2014; S. & Liang, 2015)

Reproducible experiments on Codalab: codalab.org/worksheets

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 11 / 11

Page 58: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Discussion

RCMs provide both expressivity and coverage, which enable:

More accurate uncertainty estimates (precision)

Better partially supervised learning updates

Related work:

Coarse-to-fine inference (Petrov et al., 2006; Weiss et al., 2010)

Certificates of optimality (Sontag, 2010)

Tractable models (Poon & Domingos, 2011; Niepert & Domingos, 2014; Li& Zemel, 2014; S. & Liang, 2015)

Reproducible experiments on Codalab: codalab.org/worksheets

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 11 / 11

Page 59: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Discussion

RCMs provide both expressivity and coverage, which enable:

More accurate uncertainty estimates (precision)

Better partially supervised learning updates

Related work:

Coarse-to-fine inference (Petrov et al., 2006; Weiss et al., 2010)

Certificates of optimality (Sontag, 2010)

Tractable models (Poon & Domingos, 2011; Niepert & Domingos, 2014; Li& Zemel, 2014; S. & Liang, 2015)

Reproducible experiments on Codalab: codalab.org/worksheets

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 11 / 11

Page 60: Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input

Discussion

RCMs provide both expressivity and coverage, which enable:

More accurate uncertainty estimates (precision)

Better partially supervised learning updates

Related work:

Coarse-to-fine inference (Petrov et al., 2006; Weiss et al., 2010)

Certificates of optimality (Sontag, 2010)

Tractable models (Poon & Domingos, 2011; Niepert & Domingos, 2014; Li& Zemel, 2014; S. & Liang, 2015)

Reproducible experiments on Codalab: codalab.org/worksheets

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 11 / 11