Lecture 8 Syntactic parsing - courses.cs.ut.ee

Preview:

Citation preview

Lecture 8 Syntactic parsing

LTAT.01.001 – Natural Language ProcessingKairit Sirts (kairit.sirts@ut.ee)

03.04.2020

Plan for today

● Why syntax?● Syntactic parsing

● Shallow parsing● Constituency parsing● Dependency parsing

● Transition-based dependency parsing● Neural dependency parsers● Evaluation of dependency parsing

2

Why syntax?

3

Syntax

● Internal structure of words

4

Syntactic ambiguity

5

The chicken is ready to eat.

6

7

8

Exercise

How many different meanings has the sentence:

Time flies like an arrow.

9

The role of syntax in NLP

● Text generation/summarization/machine translation● Useful features for various information extraction tasks● Syntactic structure also reflects the semantic relations between the words

10

Syntactic analysis/parsing

● Shallow parsing● Phrase structure / constituency parsing● Dependency parsing

11

Shallow parsing

12

Shallow parsing

● Also called chunking or light parsing● Split the sentence into non-overlapping syntactic phrases

13

John hit the ball.NP VP NP

NP – Noun phraseVP – Verb phrase

Shallow parsing

The morning flight from Denver has arrived. NP PP NP VP

NP – Noun phrasePP – Prepositional PhraseVP – Verb phrase

14

BIO tagging

A labelling scheme often used in information extraction problems, treated as a sequence tagging task

The morning flight from Denver has arrived. B_NP I_NP I_NP B_PP B_NP B_VP I_VP

B_NP – Beginning of a noun phraseI_NP – Inside a noun phraseB_VB – Beginning of a verb phrase etc

15

Sequence classifier

● Need annotated data for training: POS-tagged, phrase-annotated

● Use a sequence classifier of your choice● CRF with engineered features● Neural sequence tagger

16

Constituency Parsing

17

Constituency parsing

● Full constituency parsing helps to resolve structural ambiguities

18

Bracketed style

● The trees can be represented linearly with brackets

(S (Pr I)(Aux will)(VP (V do)

(NP (Det my)(N homework))NP

)VP)S

19

Context-free grammars

S → NP VPVP → V NPVP → V NP PPNP → NP NPNP → NP PPNP → NNP → ePP → P NP

N → peopleN → fishN → tanksN → rodsV → peopleV → fishV → tanksP → with

20

G = (T, N, S, R)● T – set of terminal symbols● N – set of non-terminal symbols● S – start symbol (S ∈ N)● R – set of rules/productions of

the form X → 𝛾● X ∈ N● 𝛾 ∈ (N U T)*

A grammar G generates a language L

Probabilistic PCFG

G = (T, N, S, R, P)● T – set of terminal symbols● N – set of non-terminal symbols● S – start symbol (S ∈ N)● R – set of rules/productions of the form X → 𝛾● P – probability function

● P: R → [0, 1]● ∀ X ∈ N, ∑𝑿→𝜸∈𝑹𝑷 𝑿 → 𝜸 = 𝟏

A grammar G generates a language model L

21

A PCFG

S → NP VP 1.0VP → V NP 0.6VP → V NP PP 0.4NP → NP NP 0.1NP → NP PP 0.2NP → N 0.7PP → P NP 1.0

N → people 0.5N → fish 0.2N → tanks 0.2N → rods 0.1V → people 0.1V → fish 0.6V → tanks 0.3P → with 1.0

22

The probability of strings and trees

● P(t) – the probability of a tree t is the product of the probabilities of the rules used to generate it.

𝑃 𝑡 = -.→/∈0

𝑃 𝑋 → 𝛾

● P(s) – The probability of the string s is the sum of the probabilities of thetrees which have that string as their yield

𝑃 𝑠 =34

𝑃 𝑠, 𝑡4 =34

𝑃 𝑡4

● where t is a parse of s

23

PCFG for efficient parsing

● For efficient parsing the rules should be unary or binary● Chomsky normal form – all rules have the form:

● X --> Y Z● X --> w● X, Y, Z - non-terminal symbols● w – terminal symbol● No epsilon rules

24

Before and after binarization

25

Finding the most likely tree: CKY parsing

● Dynamic programming algorithm● Proceeds bottom-up and performs Viterbi on trees

26

Dependency Parsing

27

Dependency parsing

28

• Dependency parse is a directed graph G = (V, A)• V – the set of vertices corresponding to words• A – the set of nodes corresponding to dependency relations

Dependency parsing

29

Dependency relations

● The arrows connect heads and their dependents● The main verb is the head or the root of the whole sentence● The arrows are labelled with grammatical functions/dependency

relations

30

Dependent

Head

Root of the sentence

Labelled dependency relation

Properties of a dependency graph

A dependency tree is a directed graph that satisfies the following constraints:1. There is a single designated root node that has no incoming arcs

● Typically the main verb of the sentence

2. Except for the root node, each node has exactly one incoming arc● Each dependent has a single head

3. There is a unique path from the root node to each vertex in V● The graph is acyclic and connected

31

Projectivity

● Projective trees – there are no arc crossings in the dependency graphs● Non-projective trees - crossings due to free word order

32

Dependency relationsClausal argument relations Description Example

NSUBJ Nominal subject United canceled the flight.

DOBJ Direct object United canceled the flight.

IOBJ Indirect object We booked her the flight to Paris.

Nominal modifier relations Description Example

NMOD Nominal modifier We took the morning flight.

AMOD Adjectival modifier Book the cheapest flight.

NUMMOD Numeric modifier Before the storm JetBlue canceled 1000 flights

APPOS Appositional modifier United, a unit of UAL, matched the fares.

DET Determiner Which flight was delayed?

CASE Prepositions, postpositions Book the flight through Houston.

Other notable relations Description Example

CONJ Conjunct We flew to Denver and drove to Steamboat

CC Coordinating conjunction We flew to Denver and drove to Steamboat.. 33

Universal dependencies

● http://universaldependencies.org/● Annotated treebanks in many languages● Uniform annotation scheme across all UD languages:

● Universal POS tags● Universal morphological features● Universal dependency relations

● All data in CONLL-U tabulated format

34

35

CONLL-U format

● A standard tabular format for certain type of annotated data● Each word is in a separate line● 10 tab-separated columns on each line:

1. Word index2. The word itself3. Lemma4. Universal POS5. Language specific POS6. Morphological features7. Head of the current word8. Dependency relation to the head9. Enhanced dependencies10. Any other annotation

36

CONLL-U format: example# text = They buy and sell books.

1 They they PRON PRP Case=Nom|Number=Plur 2 nsubj

2 buy buy VERB VBP Number=Plur|Person=3|Tense=Pres 0 root

3 and and CONJ CC _ 4 cc

4 sell sell VERB VBP Number=Plur|Person=3|Tense=Pres 2 conj

5 books book NOUN NNS Number=Plur 2 obj

6 . . PUNCT . _ 2 punct

# text = I had no clue.

1 I I PRON PRP Case=Nom|Number=Sing|Person=1 2 nsubj

2 had have VERB VBD Number=Sing|Person=1|Tense=Past 0 root

3 no no DET DT PronType=Neg 4 det

4 clue clue NOUN NN Number=Sing 2 obj

5 . . PUNCT . _ 2 punct 37

Dependency parsing methods

● Transition-based parsing● stack-based algorithms/shift-reduce parsing● only generate projective trees

● Graph-based algorithms● can also generate non-projective trees

38

Transition-based dependency parsing

39

Transition-based parsing

● Three main components:● Stack● Buffer● Set of dependency relations

● A configuration is the current state of the stack, buffer and the relation set

40

Arc-standard parsing system

● Initial configuration:

● Stack 𝜎 contains the ROOT symbol● Buffer 𝛽 contains all words in the sentence● Dependency relation set A is empty

41

Arc-standard parsing system

At each step perform either:● Shift – move a word from the buffer to the stack:

42

Arc-standard parsing system

At each step perform either:● Shift – move a word from the buffer to the stack:

● LeftArc – left arc between top two words in the stack, pop the second word:

43

Arc-standard parsing system

At each step perform either:● Shift – move a word from the buffer to the stack:

● LeftArc – left arc between top two words in the stack, pop the second word:

● RightArc – right arc between top two words in the stack, pop the first word:

44

Oracle

● The annotated data is in the form of a treebank● Each sentence is annotated with its dependency tree

● The task of the transition-based parser is to predict the correct parsing operation at each step:● Input is configuration● Output is parsing action: Shift, RightArc or LeftArc

● The role of the oracle is to return the correct parsing operation for each configuration in the training set● It creates a training set for the parsing model

45

Oracle

● Choose LeftArc if it produces a correct head-dependent relation given the reference parse and the current configuration

● Choose RightArc if:● It produces a correct head-dependent relation given the reference parse and the

current configuration● All of the dependents of the word at the top of the stack have already been assigned

● Otherwise choose Shift

46

Example

47

Shift:

LeftArc:

RightArc:

Starting configurationStack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat]

48

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat]

49

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat]

50

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat]

51

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat]

52

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat]

53

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat]

54

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat] Shift

[ROOT, sat, on, the] [mat]

55

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat] Shift

[ROOT, sat, on, the] [mat] Shift

[ROOT, sat, on, the, mat] []

56

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat] Shift

[ROOT, sat, on, the] [mat] Shift

[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)

[ROOT, sat, on, mat] []

57

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat] Shift

[ROOT, sat, on, the] [mat] Shift

[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)

[ROOT, sat, on, mat] [] Left-Arc case(on ← mat)

[ROOT, sat, mat] []

58

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat] Shift

[ROOT, sat, on, the] [mat] Shift

[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)

[ROOT, sat, on, mat] [] Left-Arc case(on ← mat)

[ROOT, sat, mat] [] Right-Arc nmod(sat → mat)

[ROOT, sat] []59

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat] Shift

[ROOT, sat, on, the] [mat] Shift

[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)

[ROOT, sat, on, mat] [] Left-Arc case(on ← mat)

[ROOT, sat, mat] [] Right-Arc nmod(sat → mat)

[ROOT, sat] [] Right-Arc root(ROOT, sat)

[ROOT [] 60

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat] Shift

[ROOT, sat, on, the] [mat] Shift

[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)

[ROOT, sat, on, mat] [] Left-Arc case(on ← mat)

[ROOT, sat, mat] [] Right-Arc nmod(sat → mat)

[ROOT, sat] [] Right-Arc root(ROOT, sat)

[ROOT [] Done 61

Neural dependency parsers

62

Kipperwasser and Goldberg, 2016. Simple and Accurate Parsing Using Bidirectional LSTM Feature Representations

63

Kipperwasser and Goldberg, 2016. Simple and Accurate Parsing Using Bidirectional LSTM Feature Representations

● Features: concatenation of the following LSTM vectors:● three top words from the stack● first word in the buffer

● Prediction:● Score the features with a feed-forward NN (MLP)● Train using hinge loss: maximise the MLP score between the highest scoring

correct action and the highest scoring incorrect action

64

G – the set of correct transitions from the current configurationA – the set of all transitions from the current configuration

Dozat et al., 2017. Stanford’s Graph-Based Neural Dependency Parser at the CoNLL 2017 Shared Task.

65

From each LSTM vector create four vectors for classification

● for head word ● for head relation● for dependent word● for dependent relation

head

head rel dep

dep rel

Dozat et al., 2017. Stanford’s Graph-Based Neural Dependency Parser at the CoNLL 2017 Shared Task.

66

● Use biaffine classifier to score thedep vector of each word with the head vector of every other word

● Then use biaffine classifier to score the relation between the head and dependent using the respective relation vectors

Evaluating dependency parsing

67

Evaluation

Unlabelled attachment score:● The proportion of correct head attachments

Labelled attachment score:● The proportion of correct head attachments labelled with the correct

relation

Label accuracy● The proportion of correct incoming relation labels ignoring the head

68

Evaluation

69

UAS = LAS = LA =

Evaluation

70

UAS = 5/6 LAS = 4/6LA = 4/6

Recommended