121
Punctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky with Daniel Jurafsky (Stanford University) and Hiyan Alshawi (Google Inc.) Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 1 / 25

Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Punctuation: Making a Point

in Unsupervised Dependency Parsing

Valentin I. Spitkovsky

with Daniel Jurafsky (Stanford University)

and Hiyan Alshawi (Google Inc.)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 1 / 25

Page 2: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Raw Text

Example: Raw Word Stream

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 2 / 25

Page 3: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Raw Text

Example: Raw Word Stream

ALTHOUGH IT PROBABLY HAS REDUCED THELEVEL OF EXPENDITURES FOR SOME

PURCHASERS UTILIZATION MANAGEMENTLIKE MOST OTHER COST CONTAINMENTSTRATEGIES DOESN’T APPEAR TO HAVE

ALTERED THE LONG-TERM RATE OFINCREASE IN HEALTH-CARE COSTS THE

INSTITUTE OF MEDICINE AN AFFILIATE OFTHE NATIONAL ACADEMY OF SCIENCESCONCLUDED AFTER A TWO-YEAR STUDY

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 2 / 25

Page 4: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Unformatted Text

Example:

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 3 / 25

Page 5: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Unformatted Text

Example:

formatting (missing structural cues):

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 3 / 25

Page 6: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Unformatted Text

Example:

formatting (missing structural cues):— e.g., punctuation and capitalization

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 3 / 25

Page 7: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Unformatted Text

Example:

formatting (missing structural cues):— e.g., punctuation and capitalization

raw word streams often difficult even for humans— e.g., transcribed utterances (Kim and Woodland, 2002)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 3 / 25

Page 8: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Unlexicalized Tokens

Example:

IN PRP RB VBZ VBN DT NN IN NNS IN DTNNS NN NN IN RBS JJ NN NN NNS VBZ RBVB TO VB VBN DT JJ NN IN NN IN JJ NNSDT NNP IN NNP DT NN IN DT NNP NNP IN

NNPS VBD IN DT JJ NN

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 4 / 25

Page 9: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Formatted Text

Example:

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25

Page 10: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Formatted Text

Example:

[SBAR Although it probably has reduced the level ofexpenditures for some purchasers],

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25

Page 11: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Formatted Text

Example:

[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization

management] —

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25

Page 12: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Formatted Text

Example:

[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization

management] — [PP like most other costcontainment strategies] —

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25

Page 13: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Formatted Text

Example:

[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization

management] — [PP like most other costcontainment strategies] — [VP doesn’t appear to

have altered the long-term rate of increase inhealth-care costs],

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25

Page 14: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Formatted Text

Example:

[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization

management] — [PP like most other costcontainment strategies] — [VP doesn’t appear to

have altered the long-term rate of increase inhealth-care costs], [NP the Institute of Medicine],

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25

Page 15: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Formatted Text

Example:

[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization

management] — [PP like most other costcontainment strategies] — [VP doesn’t appear to

have altered the long-term rate of increase inhealth-care costs], [NP the Institute of Medicine],

[NP an affiliate of the National Academy of

Sciences],

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25

Page 16: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Example Formatted Text

Example:

[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization

management] — [PP like most other costcontainment strategies] — [VP doesn’t appear to

have altered the long-term rate of increase inhealth-care costs], [NP the Institute of Medicine],

[NP an affiliate of the National Academy of

Sciences], [VP concluded after a two-year study].

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25

Page 17: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Strong Cues

Intuition:

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 6 / 25

Page 18: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Strong Cues

Intuition:

punctuation is a strong structural cue

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 6 / 25

Page 19: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Strong Cues

Intuition:

punctuation is a strong structural cue— demarcates separable fragments

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 6 / 25

Page 20: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Strong Cues

Intuition:

punctuation is a strong structural cue— demarcates separable fragments

we will make simplifying independence assumptions

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 6 / 25

Page 21: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Strong Cues

Intuition:

punctuation is a strong structural cue— demarcates separable fragments

we will make simplifying independence assumptions— (unreasonably) strong in training

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 6 / 25

Page 22: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Strong Cues

Intuition:

punctuation is a strong structural cue— demarcates separable fragments

we will make simplifying independence assumptions— (unreasonably) strong in training

less crude in inference— (reasonably) weak in final decoding

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 6 / 25

Page 23: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Strong Assumption

Intuition:

strong constraint

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 7 / 25

Page 24: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Strong Assumption

Intuition:

strong constraint: (head ← head) in training

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 7 / 25

Page 25: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Strong Assumption

Intuition:

strong constraint: (head ← head) in training

word head , head word word ,

head word word word word word word word .

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 7 / 25

Page 26: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Strong Assumption

Intuition:

strong constraint: (head ← head) in training

word head , head word word ,

head word word word word word word word .

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 7 / 25

Page 27: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Strong Assumption

Intuition:

strong constraint: (head ← head) in training

word head , head word word ,

head word word word word word word word .

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 7 / 25

Page 28: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Strong Assumption

Intuition:

strong constraint: (head ← head) in training

Other countries , including West Germany ,

may have a hard time justifying continued membership .

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 7 / 25

Page 29: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Weak Assumption

Intuition:

weak constraint

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 8 / 25

Page 30: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Weak Assumption

Intuition:

weak constraint: (head ← external word) in inference

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 8 / 25

Page 31: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Weak Assumption

Intuition:

weak constraint: (head ← external word) in inference

word word head word word word ,

head word word word word word word word .

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 8 / 25

Page 32: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Weak Assumption

Intuition:

weak constraint: (head ← external word) in inference

word word head word word word ,

head word word word word word word word .

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 8 / 25

Page 33: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Weak Assumption

Intuition:

weak constraint: (head ← external word) in inference

word word head word word word ,

head word word word word word word word .

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 8 / 25

Page 34: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Intuition Weak Assumption

Intuition:

weak constraint: (head ← external word) in inference

IFI also has nonvoting preferred shares ,

which are quoted on the Milan stock exchange .

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 8 / 25

Page 35: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Constituents

Linguistic Analysis:

punctuation and syntax are related(Nunberg, 1990; Briscoe, 1994; Jones 1994; Doran, 1998, inter alia)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 9 / 25

Page 36: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Constituents

Linguistic Analysis:

punctuation and syntax are related(Nunberg, 1990; Briscoe, 1994; Jones 1994; Doran, 1998, inter alia)

49.4% of inter-punctuation fragments are constituents

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 9 / 25

Page 37: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Constituents

Linguistic Analysis:

punctuation and syntax are related(Nunberg, 1990; Briscoe, 1994; Jones 1994; Doran, 1998, inter alia)

49.4% of inter-punctuation fragments are constituents

lowest dominating non-terminals:%

S 32.5NP 27.2VP 13.3PP 10.1SBAR 6.7ADVP 3.3QP 2.5SINV 2.0ADJP 1.0

98.5

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 9 / 25

Page 38: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Strong Dependencies

Linguistic Analysis:

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 10 / 25

Page 39: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Strong Dependencies

Linguistic Analysis:

strong (in training)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 10 / 25

Page 40: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Strong Dependencies

Linguistic Analysis:

strong (in training), e.g.,

... arrests followed a “ Snake Day ” at Utrecht ...

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 10 / 25

Page 41: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Strong Dependencies

Linguistic Analysis:

strong (in training), e.g.,

... arrests followed a “ Snake Day ” at Utrecht ...

— already 74.0% agreement with head-percolated trees

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 10 / 25

Page 42: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Weak Dependencies

Linguistic Analysis:

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 11 / 25

Page 43: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Weak Dependencies

Linguistic Analysis:

weak (in inference)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 11 / 25

Page 44: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Weak Dependencies

Linguistic Analysis:

weak (in inference), e.g.,

Maryland Club also distributes tea , which ...

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 11 / 25

Page 45: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Weak Dependencies

Linguistic Analysis:

weak (in inference), e.g.,

Maryland Club also distributes tea , which ...

— now 92.9% agreement with head-percolated trees

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 11 / 25

Page 46: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Violations

Linguistic Analysis:

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 12 / 25

Page 47: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Violations

Linguistic Analysis:

generalization:

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 12 / 25

Page 48: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Violations

Linguistic Analysis:

generalization:— no path from the root may enter a fragment twice

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 12 / 25

Page 49: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Violations

Linguistic Analysis:

generalization:— no path from the root may enter a fragment twice— 95.0% agreement with head-percolated trees

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 12 / 25

Page 50: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Violations

Linguistic Analysis:

generalization:— no path from the root may enter a fragment twice— 95.0% agreement with head-percolated trees

simple violations:

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 12 / 25

Page 51: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Violations

Linguistic Analysis:

generalization:— no path from the root may enter a fragment twice— 95.0% agreement with head-percolated trees

simple violations: “seamless” quotations

Her recent report classifies the stock as a “hold.”

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 12 / 25

Page 52: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Linguistic Analysis Violations

Linguistic Analysis:

generalization:— no path from the root may enter a fragment twice— 95.0% agreement with head-percolated trees

simple violations: “seamless” quotations and even lists

Her recent report classifies the stock as a “hold.”

The company said its directors , management and

subsidiaries will remain long-term investors and ...

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 12 / 25

Page 53: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Motivation

Motivation: “Profiting from Markup”

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 13 / 25

Page 54: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Motivation

Motivation: “Profiting from Markup”

..., whereas McCain is secure on the topic, Obama<a>[VP worries about winning the pro-Israel vote]</a>.

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 13 / 25

Page 55: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Motivation

Motivation: “Profiting from Markup”

..., whereas McCain is secure on the topic, Obama<a>[VP worries about winning the pro-Israel vote]</a>.

“Capitalizing on Punctuation”

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 13 / 25

Page 56: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Motivation

Motivation: “Profiting from Markup”

..., whereas McCain is secure on the topic, Obama<a>[VP worries about winning the pro-Israel vote]</a>.

“Capitalizing on Punctuation”— more common (particularly in long sentences)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 13 / 25

Page 57: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Motivation

Motivation: “Profiting from Markup”

..., whereas McCain is secure on the topic, Obama<a>[VP worries about winning the pro-Israel vote]</a>.

“Capitalizing on Punctuation”— more common (particularly in long sentences)— more uniform (better coverage of constructs)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 13 / 25

Page 58: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

The Problem Input/Output

Problem: Unsupervised Learning of Parsing

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 14 / 25

Page 59: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

The Problem Input/Output

Problem: Unsupervised Learning of Parsing

Input: Raw Text

... By most measures, the nation’s industrial sector is nowgrowing very slowly — if at all. Factory payrolls fell inSeptember. So did the Federal Reserve ...

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 14 / 25

Page 60: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

The Problem Input/Output

Problem: Unsupervised Learning of Parsing

NN NNS VBD IN NN ♦| | | | | |

Factory payrolls fell in September .

Input: Raw Text (Sentences, Tokens and POS-tags)

... By most measures, the nation’s industrial sector is nowgrowing very slowly — if at all. Factory payrolls fell inSeptember. So did the Federal Reserve ...

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 14 / 25

Page 61: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

The Problem Input/Output

Problem: Unsupervised Learning of Parsing

NN NNS VBD IN NN ♦| | | | | |

Factory payrolls fell in September .

Input: Raw Text (Sentences, Tokens and POS-tags)

... By most measures, the nation’s industrial sector is nowgrowing very slowly — if at all. Factory payrolls fell inSeptember. So did the Federal Reserve ...

Output: Syntactic Structures (and a Probabilistic Grammar)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 14 / 25

Page 62: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Scoring

Scoring: Directed Dependency Accuracy

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 15 / 25

Page 63: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Scoring

Scoring: Directed Dependency Accuracy

NN NNS VBD IN NN ♦| | | | | |

Factory payrolls fell in September .

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 15 / 25

Page 64: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Scoring

Scoring: Directed Dependency Accuracy

NN NNS VBD IN NN ♦| | | | | |

Factory payrolls fell in September .

Directed score: 35 = 60%

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 15 / 25

Page 65: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Scoring

Scoring: Directed Dependency Accuracy

NN NNS VBD IN NN ♦| | | | | |

Factory payrolls fell in September .

Directed score: 35 = 60% (right/left-branching baselines: 2

5 = 40%).

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 15 / 25

Page 66: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Model

DMV: Dependency Model with Valence

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25

Page 67: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Model

DMV: Dependency Model with Valence

a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25

Page 68: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Model

DMV: Dependency Model with Valence

a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)

h

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25

Page 69: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Model

DMV: Dependency Model with Valence

a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)

h

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25

Page 70: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Model

DMV: Dependency Model with Valence

a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)

h

a1

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25

Page 71: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Model

DMV: Dependency Model with Valence

a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)

h

a1

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25

Page 72: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Model

DMV: Dependency Model with Valence

a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)

h

a1

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25

Page 73: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Model

DMV: Dependency Model with Valence

a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)

h

a1 a2

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25

Page 74: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Model

DMV: Dependency Model with Valence

a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)

h

a1 a2

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25

Page 75: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Model

DMV: Dependency Model with Valence

a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)

h

a1 a2

STOP

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25

Page 76: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Model

DMV: Dependency Model with Valence

a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)

h

a1 a2

STOP

P(th) =∏

dir∈{L,R}

PSTOP(ch, dir,

adj︷︸︸︷

1n=0)

n∏

i=1

P(tai ) PATTACH(ch, dir, cai )

(1− PSTOP(ch, dir,

adj︷︸︸︷

1i=1))

n=|args(h,dir)|Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25

Page 77: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Learning

Learning: Viterbi EM

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 17 / 25

Page 78: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Learning

Learning: Viterbi EM

well-suited to long sentences,which are more punctuation-rich

(Spitkovsky et al., CoNLL 2010)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 17 / 25

Page 79: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Methodology Learning

Learning: Viterbi EM

well-suited to long sentences,which are more punctuation-rich

(Spitkovsky et al., CoNLL 2010)

fast, simple and easily admits constraints(Spitkovsky et al., ACL 2010)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 17 / 25

Page 80: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Constraints

Constraints: Parser Induction

the model, i.e., projective trees (Klein and Manning, 2004)

— Dependency Model with Valence (DMV)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 18 / 25

Page 81: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Constraints

Constraints: Parser Induction

the model, i.e., projective trees (Klein and Manning, 2004)

— Dependency Model with Valence (DMV)

(((List (the fares (for ((flight) (number 891)))))) .)

partial bracketings (Pereira and Schabes, 1992)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 18 / 25

Page 82: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Constraints

Constraints: Parser Induction

the model, i.e., projective trees (Klein and Manning, 2004)

— Dependency Model with Valence (DMV)

(((List (the fares (for ((flight) (number 891)))))) .)

partial bracketings (Pereira and Schabes, 1992)

– synchronous grammars (Alshawi and Douglas, 2000)– linear-time parsing (Seginer, 2007)– skewness of trees (Seginer, 2007)– Zipfian distribution of words (Seginer, 2007)– sparse posterior regularization (Ganchev et al., 2009)

– web markup-induced constraints (Spitkovsky et al., 2010)

– semantic cues (Naseem and Barzilay, 2011)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 18 / 25

Page 83: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Unlexicalized

Experimental Results: Unlexicalized

directed dependency accuraciesfor baselines, inference, training and an oracle:

WSJ∞ (Section 23, all sentences)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25

Page 84: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Unlexicalized

Experimental Results: Unlexicalized

directed dependency accuraciesfor baselines, inference, training and an oracle:

WSJ∞ (Section 23, all sentences)

Standard Training 52.0

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25

Page 85: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Unlexicalized

Experimental Results: Unlexicalized

directed dependency accuraciesfor baselines, inference, training and an oracle:

WSJ∞ (Section 23, all sentences)Punctuation as Words 41.7 (-10.3)

Standard Training 52.0

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25

Page 86: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Unlexicalized

Experimental Results: Unlexicalized

directed dependency accuraciesfor baselines, inference, training and an oracle:

WSJ∞ (Section 23, all sentences)Punctuation as Words 41.7 (-10.3)

Standard Training 52.0w/Constrained Inference 54.0 (+2.0)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25

Page 87: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Unlexicalized

Experimental Results: Unlexicalized

directed dependency accuraciesfor baselines, inference, training and an oracle:

WSJ∞ (Section 23, all sentences)Punctuation as Words 41.7 (-10.3)

Standard Training 52.0w/Constrained Inference 54.0 (+2.0)

Constrained Training 55.6 (+3.6)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25

Page 88: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Unlexicalized

Experimental Results: Unlexicalized

directed dependency accuraciesfor baselines, inference, training and an oracle:

WSJ∞ (Section 23, all sentences)Punctuation as Words 41.7 (-10.3)

Standard Training 52.0w/Constrained Inference 54.0 (+2.0)

Constrained Training 55.6 (+3.6)w/Constrained Inference 57.4 (+1.8)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25

Page 89: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Unlexicalized

Experimental Results: Unlexicalized

directed dependency accuraciesfor baselines, inference, training and an oracle:

WSJ∞ (Section 23, all sentences)Punctuation as Words 41.7 (-10.3)

Standard Training 52.0w/Constrained Inference 54.0 (+2.0)

Constrained Training 55.6 (+3.6)w/Constrained Inference 57.4 (+1.8)

Supervised DMV 69.8

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25

Page 90: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Unlexicalized

Experimental Results: Unlexicalized

directed dependency accuraciesfor baselines, inference, training and an oracle:

WSJ∞ (Section 23, all sentences)Punctuation as Words 41.7 (-10.3)

Standard Training 52.0w/Constrained Inference 54.0 (+2.0)

Constrained Training 55.6 (+3.6)w/Constrained Inference 57.4 (+1.8)

Supervised DMV 69.8w/Constrained Inference 73.0 (+3.2)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25

Page 91: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Lexicalized

Experimental Results: Lexicalized

directed dependency accuracies comparedto previous state-of-the-art

WSJ∞

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 20 / 25

Page 92: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Lexicalized

Experimental Results: Lexicalized

directed dependency accuracies comparedto previous state-of-the-art

WSJ∞

Unlexicalized 57.4

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 20 / 25

Page 93: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Lexicalized

Experimental Results: Lexicalized

directed dependency accuracies comparedto previous state-of-the-art

WSJ∞

(Spitkovsky et al., ACL 2010) 50.4

Unlexicalized 57.4

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 20 / 25

Page 94: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Lexicalized

Experimental Results: Lexicalized

directed dependency accuracies comparedto previous state-of-the-art

WSJ∞

(Spitkovsky et al., ACL 2010) 50.4Lexicalized (Gillenwater et al., 2010) 53.3

Unlexicalized 57.4

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 20 / 25

Page 95: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Lexicalized

Experimental Results: Lexicalized

directed dependency accuracies comparedto previous state-of-the-art

WSJ∞

(Spitkovsky et al., ACL 2010) 50.4Lexicalized (Gillenwater et al., 2010) 53.3

Lexicalized (Blunsom and Cohn, 2010) 55.7Unlexicalized 57.4

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 20 / 25

Page 96: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Lexicalized

Experimental Results: Lexicalized

directed dependency accuracies comparedto previous state-of-the-art

WSJ∞

(Spitkovsky et al., ACL 2010) 50.4Lexicalized (Gillenwater et al., 2010) 53.3

Lexicalized (Blunsom and Cohn, 2010) 55.7Unlexicalized 57.4

Lexicalized Constrained Training 58.0

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 20 / 25

Page 97: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Lexicalized

Experimental Results: Lexicalized

directed dependency accuracies comparedto previous state-of-the-art

WSJ∞

(Spitkovsky et al., ACL 2010) 50.4Lexicalized (Gillenwater et al., 2010) 53.3

Lexicalized (Blunsom and Cohn, 2010) 55.7Unlexicalized 57.4

Lexicalized Constrained Training 58.0w/Constrained Infernce 58.4

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 20 / 25

Page 98: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Without Gold Tags

Experimental Results: “Fully” Unsupervised

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 21 / 25

Page 99: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Without Gold Tags

Experimental Results: “Fully” Unsupervised

constraints sufficiently strong to abandon gold tags

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 21 / 25

Page 100: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Without Gold Tags

Experimental Results: “Fully” Unsupervised

constraints sufficiently strong to abandon gold tags

WSJ∞

(this work) 58.4

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 21 / 25

Page 101: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Without Gold Tags

Experimental Results: “Fully” Unsupervised

constraints sufficiently strong to abandon gold tags

WSJ∞

(this work) 58.4w/o Gold Tags 58.2

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 21 / 25

Page 102: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Without Gold Tags

Experimental Results: “Fully” Unsupervised

constraints sufficiently strong to abandon gold tags

WSJ∞

(this work) 58.4w/o Gold Tags 58.2

using Clark’s (2000) unsupervised clusters

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 21 / 25

Page 103: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Without Gold Tags

Experimental Results: “Fully” Unsupervised

constraints sufficiently strong to abandon gold tags

WSJ∞

(this work) 58.4w/o Gold Tags 58.2

using Clark’s (2000) unsupervised clusters— constructed by Finkel and Manning (2009) for NER

http://nlp.stanford.edu/software/

stanford-postagger-2008-09-28.tar.gz:

models/egw.bnc.200

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 21 / 25

Page 104: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Without Gold Tags

Experimental Results: “Fully” Unsupervised

constraints sufficiently strong to abandon gold tags

WSJ∞

(this work) 58.4w/o Gold Tags 58.2

using Clark’s (2000) unsupervised clusters— constructed by Finkel and Manning (2009) for NER

http://nlp.stanford.edu/software/

stanford-postagger-2008-09-28.tar.gz:

models/egw.bnc.200

(Come see our poster at EMNLP!)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 21 / 25

Page 105: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Multi-Lingual

Experimental Results: Multi-Lingualfurther evaluation against CoNLL 2006/7 data sets— results generalize across languages:

Arabic 2006’7

Basque ’7Bulgarian ’6Catalan ’7Czech ’6

’7Danish ’6Dutch ’6English ’7German ’6Greek ’7Hungarian ’7Italian ’7Japanese ’6Portuguese ’6Slovenian ’6Spanish ’6Swedish ’6Turkish ’6

’7

Average:

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 22 / 25

Page 106: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Multi-Lingual

Experimental Results: Multi-Lingualfurther evaluation against CoNLL 2006/7 data sets— results generalize across languages:

Inference OnlyArabic 2006 +0.1

’7 +0.9Basque ’7 +0.8Bulgarian ’6 +1.1Catalan ’7 +0.8Czech ’6 +0.9

’7 +1.0Danish ’6 +0.9Dutch ’6 +1.0English ’7 +1.3German ’6 +0.8Greek ’7 +0.5Hungarian ’7 +0.4Italian ’7 +0.1Japanese ’6 +0.0Portuguese ’6 +0.7Slovenian ’6 +2.0Spanish ’6 +0.8Swedish ’6 +0.5Turkish ’6 +0.1

’7 +0.2

Average: +0.7

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 22 / 25

Page 107: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Experimental Results Multi-Lingual

Experimental Results: Multi-Lingualfurther evaluation against CoNLL 2006/7 data sets— results generalize across languages:

Inference Only Training & InferenceArabic 2006 +0.1 +1.1

’7 +0.9 +2.6Basque ’7 +0.8 +0.6Bulgarian ’6 +1.1 +1.6Catalan ’7 +0.8 +0.9Czech ’6 +0.9 +3.0

’7 +1.0 +2.7Danish ’6 +0.9 +0.2Dutch ’6 +1.0 +3.0English ’7 +1.3 +2.8German ’6 +0.8 +1.6Greek ’7 +0.5 +0.7Hungarian ’7 +0.4 +1.4Italian ’7 +0.1 -0.8Japanese ’6 +0.0 +0.1Portuguese ’6 +0.7 +0.8Slovenian ’6 +2.0 +2.8Spanish ’6 +0.8 +0.8Swedish ’6 +0.5 +0.8Turkish ’6 +0.1 +1.0

’7 +0.2 +0.1

Average: +0.7 +1.3

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 22 / 25

Page 108: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Conclusion Thoughts

Thoughts:

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 23 / 25

Page 109: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Conclusion Thoughts

Thoughts:

extend existing parsers

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 23 / 25

Page 110: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Conclusion Thoughts

Thoughts:

extend existing parsers— no need to retrain models

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 23 / 25

Page 111: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Conclusion Thoughts

Thoughts:

extend existing parsers— no need to retrain models— supervised systems?

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 23 / 25

Page 112: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Conclusion Thoughts

Thoughts:

extend existing parsers— no need to retrain models— supervised systems?

would prosody aid with induction from speech?

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 23 / 25

Page 113: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Conclusion Thoughts

Thoughts:

extend existing parsers— no need to retrain models— supervised systems?

would prosody aid with induction from speech?— “as words” breaks n-grams (Kahn et al., 2005)

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 23 / 25

Page 114: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Conclusion Summary

Summary:

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 24 / 25

Page 115: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Conclusion Summary

Summary:

punctuation helps dependency grammar induction

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 24 / 25

Page 116: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Conclusion Summary

Summary:

punctuation helps dependency grammar induction— even better than markup...

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 24 / 25

Page 117: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Conclusion Summary

Summary:

punctuation helps dependency grammar induction— even better than markup...

a popular approach: powerful models

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 24 / 25

Page 118: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Conclusion Summary

Summary:

punctuation helps dependency grammar induction— even better than markup...

a popular approach: powerful models— priors prevent overfitting

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 24 / 25

Page 119: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Conclusion Summary

Summary:

punctuation helps dependency grammar induction— even better than markup...

a popular approach: powerful models— priors prevent overfitting

an alternative: overly simple models

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 24 / 25

Page 120: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Conclusion Summary

Summary:

punctuation helps dependency grammar induction— even better than markup...

a popular approach: powerful models— priors prevent overfitting

an alternative: overly simple models— constraints prevent underfitting

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 24 / 25

Page 121: Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Conclusion Thanks! Questions?

Thanks!

Punctuation. It works...

Any questions?

Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 25 / 25