26
CS 4705 1 Basic Parsing with Context- Free Grammars

CS 47051 Basic Parsing with Context-Free Grammars

  • View
    235

  • Download
    3

Embed Size (px)

Citation preview

Page 1: CS 47051 Basic Parsing with Context-Free Grammars

CS 4705 1

Basic Parsing with Context-Free Grammars

Page 2: CS 47051 Basic Parsing with Context-Free Grammars

2

Analyzing Linguistic Units

• Morphological parsing: – analyze words into morphemes and affixes

– rule-based, FSAs, FSTs

• Ngrams for Language Modeling• POS Tagging• Syntactic parsing:

– identify constituents and their relationships

– to see if a sentence is grammatical

– to assign an abstract representation of meaning

Page 3: CS 47051 Basic Parsing with Context-Free Grammars

3

Syntactic Parsing

• Declarative formalisms like CFGs, FSAs define the legal strings of a language -- but only tell you ‘this is a legal string of the language X’

• Parsing algorithms specify how to recognize the strings of a language and assign each string one (or more) syntactic analyses

• Parsing useful for grammar checking, semantic analysis, MT, QA, information extraction, speech recognition…almost every task in NLP…but…

Page 4: CS 47051 Basic Parsing with Context-Free Grammars

4

Parsing as a Form of Search

• Searching FSAs– Finding the right path through the automaton

– Search space defined by structure of FSA

• Searching CFGs– Finding the right parse tree among all possible parse

trees

– Search space defined by the grammar

• Constraints provided by the input sentence and the automaton or grammar

Page 5: CS 47051 Basic Parsing with Context-Free Grammars

5

CFG for Fragment of English

S NP VP VP V

S Aux NP VP PP -> Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a

VP V NP

TopD BotUp E.g. LC’s

Page 6: CS 47051 Basic Parsing with Context-Free Grammars

6

S

VP

NP

Nom

V Det N

Book that flight

Parse Tree for ‘Book that flight’ for Prior CFG

Page 7: CS 47051 Basic Parsing with Context-Free Grammars

7

Rule Expansion

VP V NP (2)

Nom Nom PP

PropN Houston | TWANom N (4)

Prep from | to | onNom N Nom

Aux doesNP PropN

V book | include | preferNP Det Nom (3)

N book | flight | meal | moneyS VP (1)

PP -> Prep NPS Aux NP VP

VP VS NP VP

TopD BotUp E.g. LC’s

Det that | this | a

Page 8: CS 47051 Basic Parsing with Context-Free Grammars

8

Top-Down Parser

• Builds from the root S node to the leaves• Assuming we build all trees in parallel:

– Find all trees with root S (or all rules w/lhs S)

– Next expand all constituents in these trees/rules

– Continue until leaves are pos

– Candidate trees failing to match pos of input string are rejected (e.g. Book that flight matches only one subtree)

Page 9: CS 47051 Basic Parsing with Context-Free Grammars

9

Top-Down Search Space for CFG (expanding only leftmost leaves)

S S S

NP VP Aux NP VP VP

S S S S S S

NP VP NP VP Aux NP VP Aux NP VP VP VP

Det Nom PropN Det Nom PropN V NP V

Det Nom

N

Page 10: CS 47051 Basic Parsing with Context-Free Grammars

10

Bottom-Up Parsing

• Parser begins with words of input and builds up trees, applying grammar rules whose rhs match– Book that flight

N Det N V Det N

Book that flight Book that flight

– ‘Book’ ambiguous (2 pos appear in grammar)

– Parse continues until an S root node reached or no further node expansion possible

Page 11: CS 47051 Basic Parsing with Context-Free Grammars

11

Two Candidates: One Successful Parse

S

VP

VP NP NP

Nom Nom

V Det N V Det N

Book that flight Book that flight

S ~ VP NP

Page 12: CS 47051 Basic Parsing with Context-Free Grammars

12

What’s right/wrong with….

• Top-Down parsers – they never explore illegal parses (e.g. which can’t form an S) -- but waste time on trees that can never match the input

• Bottom-Up parsers – they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)

• For both: find a control strategy -- how explore search space efficiently?– Pursuing all parses in parallel or backtrack or …?

– Which rule to apply next?

– Which node to expand next?

Page 13: CS 47051 Basic Parsing with Context-Free Grammars

13

A Possible Top-Down Parsing Strategy

• Depth-first search: – Agenda of search states: expand search space

incrementally, exploring most recently generated state (tree) each time

– When you reach a state (tree) inconsistent with input, backtrack to most recent unexplored state (tree)

• Which node to expand?– Leftmost or rightmost

• Which grammar rule to use?– Order in the grammar? How?

Page 14: CS 47051 Basic Parsing with Context-Free Grammars

14

Top-Down, Depth-First, Left-Right Strategy

• Initialize agenda with ‘S’ tree and ptr to first word (cur)

• Loop: Until successful parse or empty agenda– Apply next applicable grammar rule to leftmost

unexpanded node (n) of current tree (t) on agenda and push resulting tree (t’) onto agenda

• If n is a POS category and matches the POS of cur, push new tree (t’’) onto agenda and increment cur

• Else pop t’ from agenda

– Final agenda contains history of successful parse

• Does this flight include a meal?

Page 15: CS 47051 Basic Parsing with Context-Free Grammars

15

Fig 10.7

CFG

Page 16: CS 47051 Basic Parsing with Context-Free Grammars

16

Left Corners: Top-Down Parsing with Bottom-Up Filtering

• We saw: Top-Down, depth-first, L2R parsing – Expands non-terminals along the tree’s left edge down

to leftmost leaf of tree

– Moves on to expand down to next leftmost leaf…

– Note: In successful parse, current input word will be first word in derivation of node the parser currently processing

– So….look ahead to left-corner of the tree

• B is a left-corner of A if A =*=> Bα

• Build table with left-corners of all non-terminals in grammar and consult before applying rule

Page 17: CS 47051 Basic Parsing with Context-Free Grammars

17

Left Corners

Page 18: CS 47051 Basic Parsing with Context-Free Grammars

18

Left-Corner Table for CFG

Category Left Corners (POS)

S Det, PropN, Aux, V

NP Det, PropN

Nom N

VP V

Page 19: CS 47051 Basic Parsing with Context-Free Grammars

19

Left Recursion vs. Right Recursion

• Depth-first search will never terminate if grammar is left recursive (e.g. NP --> NP PP)

),( **

Page 20: CS 47051 Basic Parsing with Context-Free Grammars

20

• Solutions:– Rewrite the grammar (automatically?) to a weakly

equivalent one which is not left-recursivee.g. The man {on the hill with the telescope…}NP NP PP (wanted: Nom plus a sequence of PPs)NP Nom PPNP NomNom Det N…becomes…NP Nom NP’Nom Det NNP’ PP NP’ (wanted: a sequence of PPs)NP’ e• Not so obvious what these rules mean…

Page 21: CS 47051 Basic Parsing with Context-Free Grammars

21

– Harder to detect and eliminate non-immediate left recursion

– NP --> Nom PP

– Nom --> NP

– Fix depth of search explicitly

– Rule ordering: non-recursive rules first

• NP --> Det Nom

• NP --> NP PP

Page 22: CS 47051 Basic Parsing with Context-Free Grammars

22

An Exercise: The city hall parking lot in town

• NP NP NP PP• NP Det Nom• NP Adj Nom• NP Nom Nom• Nom NP Nom• Nom N• PP Prep NP• N city | hall | lot | town• Adj parking• Prep to | for | in

Page 23: CS 47051 Basic Parsing with Context-Free Grammars

23

Another Problem: Structural ambiguity

• Multiple legal structures– Attachment (e.g. I saw a man on a hill with a telescope)

– Coordination (e.g. younger cats and dogs)

– NP bracketing (e.g. Spanish language teachers)

Page 24: CS 47051 Basic Parsing with Context-Free Grammars

24

NP vs. VP Attachment

Page 25: CS 47051 Basic Parsing with Context-Free Grammars

25

• Solution? – Return all possible parses and disambiguate using

“other methods”

Page 26: CS 47051 Basic Parsing with Context-Free Grammars

26

Summing Up

• Parsing is a search problem which may be implemented with many control strategies– Top-Down or Bottom-Up approaches each have

problems• Combining the two solves some but not all issues

– Left recursion– Syntactic ambiguity

• Next time: Making use of statistical information about syntactic constituents– Read Ch 12