37
Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology (JAIST) Yusuke Miyao National Institute of Informatics (NII) Jun’ichi Kazama National Institute of Information and Communications Technology (NICT)

Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Embed Size (px)

Citation preview

Page 1: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Learning with lookahead:Can history-based models rival globally

optimized models?

Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology (JAIST)

Yusuke Miyao National Institute of Informatics (NII)

Jun’ichi KazamaNational Institute of Information and Communications Technology (NICT)

Page 2: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

History-based models

• Structured prediction problems in NLP– POS tagging, named entity recognition, parsing, …

• History-based models– Decompose the structured prediction problem into a

series of classification problems• Have been widely used in many NLP tasks– MEMMs (Ratnaparkhi, 1996; McCallum et al., 2000)– Transition-based parsers (Yamada & Matsumoto, 2003;

Nivre et al., 2006)• Becoming less popular

Page 3: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Part-of-speech (POS) tagging

• Perform multi-class classification at each word• Features are defined on observations (i.e.

words) and the POS tags on the left

I saw a dog with eyebrowsNVDP

NVDP

NVDP

NVDP

NVDP

NVDP

Page 4: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Dependency parsing

I saw a dog with eyebrows

OPERATION STACK QUEUEShiftReduceLReduceR

I saw a dog with eyebrows

Page 5: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Dependency parsing

I saw a dog with eyebrows

OPERATION STACK QUEUEShiftReduceLReduceR

I saw a dog with eyebrows

Page 6: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Dependency parsing

I saw a dog with eyebrows

OPERATION STACK QUEUEShiftReduceLReduceR

a dog with eyebrowsI saw

Page 7: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Dependency parsing

I saw a dog with eyebrows

OPERATION STACK QUEUEShiftReduceLReduceR

saw a dog with eyebrows

Page 8: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Dependency parsing

I saw a dog with eyebrows

OPERATION STACK QUEUEShiftReduceLReduceR

saw a dog with eyebrows

Page 9: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Dependency parsing

I saw a dog with eyebrows

OPERATION STACK QUEUEShiftReduceLReduceR

saw a dog with eyebrows

Page 10: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Dependency parsing

I saw a dog with eyebrows

OPERATION STACK QUEUEShiftReduceLReduceR

saw dog with eyebrows

Page 11: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Dependency parsing

I saw a dog with eyebrows

OPERATION STACK QUEUEShiftReduceLReduceR

saw dog with eyebrows

Page 12: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Dependency parsing

I saw a dog with eyebrows

OPERATION STACK QUEUEShiftReduceLReduceR

saw dog with eyebrows

Page 13: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Dependency parsing

I saw a dog with eyebrows

OPERATION STACK QUEUEShiftReduceLReduceR

saw dog with

Page 14: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Dependency parsing

I saw a dog with eyebrows

OPERATION STACK QUEUEShiftReduceLReduceR

saw dog

Page 15: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Lookahead

• Playing ChessIf I move this pawn, then the knight will be captured by that bishop, but then I

can …

Page 16: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

POS tagging with lookahead

• Consider all possible sequences of future tagging actions to a certain depth

I saw a dog with eyebrowsN V D N

VDP

NVDP

Page 17: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

POS tagging with lookahead

• Consider all possible sequences of future tagging actions to a certain depth

I saw a dog with eyebrowsN V D N

VDP

NVDP

Page 18: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

POS tagging with lookahead

• Consider all possible sequences of future tagging actions to a certain depth

I saw a dog with eyebrowsN V D N

VDP

NVDP

Page 19: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

POS tagging with lookahead

• Consider all possible sequences of future tagging actions to a certain depth

I saw a dog with eyebrowsN V D N

VDP

NVDP

Page 20: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

POS tagging with lookahead

• Consider all possible sequences of future tagging actions to a certain depth

I saw a dog with eyebrowsN V D N

VDP

NVDP

Page 21: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Dependency parsing

I saw a dog with eyebrows

OPERATION STACK QUEUEShiftReduceLReduceR

saw dog with eyebrows

ShiftReduceLReduceR

saw dog with eyebrows

Page 22: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Dependency parsing

I saw a dog with eyebrows

OPERATION STACK QUEUEShiftReduceLReduceR

saw dog with eyebrows

ShiftReduceLReduceR

saw with eyebrows

Page 23: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Choosing the best action by search

S1 S2 Sm. . . . . . .

a1 a2 am

S1* S2* S3*

searchdepth

S

Page 24: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Search

Page 25: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Decoding cost

• Time complexity: O(nm^(D+1))– n: number of actions to complete the structure– m: average number of possible actions at each state– D: search depth

• Time complexity of k-th order CRFs: O(nm^(k+1))

• History-based models with k-depth lookahead are comparable to k-th order CRFs in terms of training/testing time

Page 26: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Perceptron learning with Lookahead

S1 S2 Sm. . . . . . .

S1* S2* Sm*

a1 a2 am Without lookahead

With lookahead

*1Sw

Linear scoring model

kSS 1ww

**1 kSS ww

Correct action

Guaranteed to converge

Page 27: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Experiments

• Sequence prediction tasks– POS tagging– Text chunking (a.k.a. shallow parsing)– Named entity recognition

• Syntactic parsing– Dependency parsing

• Compared to first-order CRFs in terms of speed and accuracy

Page 28: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

POS tagging

CRF

depth = 2

depth = 1

depth = 0

96.9 97 97.1 97.2 97.3

Accuracy

• WSJ corpus

Page 29: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Training time

CRF

depth = 2

depth = 1

depth = 0

10 100 1000 10000

Second

• WSJ corpus

Page 30: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

POS tagging (+ tag trigram features)

CRF

depth = 2

depth = 1

depth = 0

96.9 97 97.1 97.2 97.3

Accuracy

• WSJ corpus

Page 31: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Chunking (shallow parsing)

CRF

depth = 2

depth = 1

depth = 0

93.35 93.4 93.45 93.5 93.55 93.6 93.65 93.7 93.75 93.8 93.85

F-score

• CoNLL 2000 data set

Page 32: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Named entity recognition

CRF

depth = 3

depth = 2

depth = 1

depth = 0

69 69.5 70 70.5 71 71.5 72 72.5

F-score

• BioNLP/NLPBA 2004 data set

Page 33: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Dependency parsing

Struc. Perc.

depth = 3

depth = 2

depth = 1

depth = 0

88.5 89 89.5 90 90.5 91 91.5

F-score

• WSJ corpus

(Zhang and Clark, 2008)

Page 34: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Related work

• MEMMs + Viterbi– label bias problem (Lafferty et al., 2001)

• Learning as search optimization (LaSO) (Daume III and Marcu 2005)– No lookahead

• Structured perceptron with beam search (Zhang and Clark, 2008)

Page 35: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Conclusion

• Can history-based models rival globally optimized models? – Yes, they can be more accurate than CRFs

• The same computational cost as CRFs

Page 36: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

Future work

• Feature Engineering

• Flexible search extension/reduction

• Easy-first tagging/parsing– (Goldbergand & Elhadad, 2010)

• Max-margin learning

Page 37: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology

THANK YOU