33
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

  • View
    220

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Probabilistic Parsing: Enhancements

Ling 571Deep Processing Techniques for NLP

January 26, 2011

Page 2: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Parser IssuesPCFGs make many (unwarranted) independence

assumptionsStructural Dependency

NP -> Pronoun: much more likely in subject position

Lexical DependencyVerb subcategorizationCoordination ambiguity

Page 3: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Improving PCFGs:Structural Dependencies

How can we capture Subject/Object asymmetry?E.g., NPsubj-> Pron vs NPObj->Pron

Parent annotation:Annotate each node with parent in parse tree

E.g., NP^S vs NP^VPAlso annotate pre-terminals:

RB^ADVP vs RB^VP IN^SBAR vs IN^PP

Can also split rules on other conditions

Page 4: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Parent Annotation

Page 5: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Parent Annotation:Pre-terminals

Page 6: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Parent AnnotationAdvantages:

Captures structural dependency in grammars

Page 7: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Parent AnnotationAdvantages:

Captures structural dependency in grammars

Disadvantages: Increases number of rules in grammar

Page 8: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Parent AnnotationAdvantages:

Captures structural dependency in grammars

Disadvantages: Increases number of rules in grammarDecreases amount of training per rule

Strategies to search for optimal # of rules

Page 9: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Improving PCFGs:Lexical Dependencies

Lexicalized rules:Best known parsers: Collins, Charniak parsersEach non-terminal annotated with its lexical head

E.g. verb with verb phrase, noun with noun phraseEach rule must identify RHS element as head

Heads propagate up treeConceptually like adding 1 rule per head value

VP(dumped) -> VBD(dumped)NP(sacks)PP(into)VP(dumped) -> VBD(dumped)NP(cats)PP(into)

Page 10: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Lexicalized PCFGsAlso, add head tag to non-terminals

Head tag: Part-of-speech tag of head wordVP(dumped) -> VBD(dumped)NP(sacks)PP(into)VP(dumped,VBD) ->

VBD(dumped,VBD)NP(sacks,NNS)PP(into,IN)

Two types of rules:Lexical rules: pre-terminal -> word

Deterministic, probability 1 Internal rules: all other expansions

Must estimate probabilities

Page 11: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011
Page 12: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

PLCFGsIssue:

Page 13: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

PLCFGsIssue: Too many rules

No way to find corpus with enough examples

Page 14: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

PLCFGsIssue: Too many rules

No way to find corpus with enough examples

(Partial) Solution: Independence assumedCondition rule on

Category of LHS, head

Condition head onCategory of LHS and parent’s head

Page 15: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Disambiguation Example

Page 16: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Disambiguation Example

67.09/6

))((

))((

),|(

dumpedVPC

VBDNPPdumpedVPC

dumpedVPVBDNPPPVPP

09/0

))((

))((

),|(

dumpedVPC

NPVBDdumpedVPC

dumpedVPVBDNPVPp

22.09/2

...)...)((

)..)(...)((

),|(

PPdumpedXC

inPPdumpedXC

dumpedPPinp

0/0

...)...)((

)...)(...)((

),|(

PPsacksXC

inPPsacksXC

sacksPPinp

Page 17: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Improving PCFGs:Tradeoffs

Tensions: Increase accuracy:

Increase specificity E.g. Lexicalizing, Parent annotation, Markovization,etc

Increases grammarIncreases processing timesIncreases training data requirements

How can we balance?

Page 18: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

CNF Factorization & Markovization

CNF factorization:Converts n-ary branching to binary branching

Page 19: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

CNF Factorization & Markovization

CNF factorization:Converts n-ary branching to binary branchingCan maintain information about original structure

Neighborhood history and parent

Issue:Potentially explosive

Page 20: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

CNF Factorization & Markovization

CNF factorization:Converts n-ary branching to binary branchingCan maintain information about original structure

Neighborhood history and parent

Issue:Potentially explosive

If keep all context: 72 -> 10K non-terminals!!!

Page 21: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

CNF Factorization & Markovization

CNF factorization:Converts n-ary branching to binary branchingCan maintain information about original structure

Neighborhood history and parent

Issue:Potentially explosive

If keep all context: 72 -> 10K non-terminals!!!

How much context should we keep?What Markov order?

Page 22: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Different Markov Orders

Page 23: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Markovization & Costs(Mohri & Roark 2006)

Page 24: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

EfficiencyPCKY is Vn3

Grammar can be huge Grammar can be extremely ambiguous

100s of analyses not unusual, esp. for long sentences

However, only care about best parsesOthers can be pretty bad

Can we use this to improve efficiency?

Page 25: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Beam ThresholdingInspired by beam search algorithm

Assume low probability partial parses unlikely to yield high probability overallKeep only top k most probably partial parse

Retain only k choices per cell For large grammars, could be 50 or 100 For small grammars, 5 or 10

Page 26: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Heuristic FilteringIntuition: Some rules/partial parses are unlikely to

end up in best parse. Don’t store those in table.

Page 27: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Heuristic FilteringIntuition: Some rules/partial parses are unlikely to

end up in best parse. Don’t store those in table.

Exclusions:Low frequency: exclude singleton productions

Page 28: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Heuristic FilteringIntuition: Some rules/partial parses are unlikely to

end up in best parse. Don’t store those in table.

Exclusions:Low frequency: exclude singleton productions

Low probability: exclude constituents x s.t. p(x) <10-

200

Page 29: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Heuristic FilteringIntuition: Some rules/partial parses are unlikely to

end up in best parse. Don’t store those in table.

Exclusions:Low frequency: exclude singleton productions

Low probability: exclude constituents x s.t. p(x) <10-200

Low relative probability:Exclude x if there exists y s.t. p(y) > 100 * p(x)

Page 30: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Notes on HW#3Outline:

Induce grammar from (small) treebank

Implement Probabilistic CKY

Evaluate parser

Improve parser

Page 31: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Treebank FormatAdapted from Penn Treebank Format

Rules simplified:Removed traces and other null elementsRemoved complex tagsReformatted POS tags as non-terminals

Page 32: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Reading the ParsesPOS unary collapse:

(NP_NNP Ontario) was

(NP (NNP Ontario))

Binarization:VP -> VP’ PP; VP’ -> VB PP

WasVP -> VB PP PP

Page 33: Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Start Early!