47
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay 21 st March, 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 21 st March, 2011

  • Upload
    peri

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency). Pushpak Bhattacharyya CSE Dept., IIT Bombay 21 st March, 2011. Grammar. A finite set of rules that generates only and all sentences of a language. - PowerPoint PPT Presentation

Citation preview

Page 1: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

CS460/626 : Natural Language Processing/Speech, NLP and the Web

(Lecture 28– Grammar; Constituency, Dependency)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

21st March, 2011

Page 2: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Grammar A finite set of rules

that generates only and all sentences of a language.

that assigns an appropriate structural description to each one.

Page 3: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Grammatical Analysis Techniques

Two main devices

– Morphological– Categorial– Functional

– Sequential– Hierarchical– Transformational

Breaking up a String

Labeling the Constituents

Page 4: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Breaking up and Labeling Sequential Breaking up

Sequential Breaking up and Morphological Labeling

Sequential Breaking up and Categorial Labeling

Sequential Breaking up and Functional Labeling

Hierarchical Breaking up Hierarchical Breaking up and Categorial

Labeling Hierarchical Breaking up and Functional

Labeling

Page 5: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Sequential Breaking up

that

student

solve ed the problem s+ + + + + +

• That student solved the problems.

Page 6: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Sequential Breaking up and Morphological Labeling

That student solved the problems.

that student solve ed the problem s

word word stem affix word stem

affix

Page 7: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Sequential Breaking up and Categorial Labeling

This boy can solve the problem.

• They called her a taxi.

this boy can solve the problem

Det N Aux V Det N

They call ed taxi

Pron V Affix N

her

Pron

a

Det

Page 8: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Sequential Breaking up and Functional Labeling

They called taxi

Subject

Verbal IndirectObject

her

Direct Object

a

They called

Subject

Verbal

taxi

DirectObject

her

Indirect Object

a

Page 9: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Hierarchical Breaking up Old men and women

Old men and women

Old men and

women

Old men and women Old men and women

womenandmenmenOld

Page 10: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Hierarchical Breaking up and Categorial Labeling

S

VP

V Adv

ran away

NP

A N

Poor John

Poor John ran away.

Page 11: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Hierarchical Breaking up and Functional Labeling

• Immediate Constituent (IC) Analysis• Construction types in terms of the function of the

constituents:– Predication (subject + predicate)– Modification (modifier + head)– Complementation (verbal + complement)– Subordination (subordinator + dependent unit)– Coordination (independent unit + coordinator)

Page 12: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Predication [Birds]subject [fly]predicate

S

PredicateSubject

Birds fly

Page 13: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Modification [A]modifier [flower]head

John [slept]head [in the room]modifier

S

PredicateSubject

John Head Modifier

slept In the room

Page 14: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

19/12/2004

Complementation He [saw]verbal [a lake]complement

S

PredicateSubject

He Verbal Complement

saw a lake

Page 15: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Subordination John slept [in]subordinator [the room]dependent

unit S

PredicateSubject

John Head Modifier

slept

the room

Subordinator Dependent Unit

in

Page 16: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Coordination [John came in time] independent unit

[but]coordinator [Mary was not ready] independent unit S

CoordinatorIndependent Unit

John came in time but Mary was not ready

Independent Unit

Page 17: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

S

HeadModifier

In the morning, the sky looked much brighter

Subordinator DU PredicateSubject

Head

Head

Head Verbal ComplementModifier Modifier

Modifier

In the morning, the sky looked much brighter.

An Example

Page 18: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Hierarchical Breaking up and Categorial / Functional Labeling Hierarchical Breaking up coupled

with Categorial /Functional Labeling is a very powerful device.

But there are ambiguities which demand something more powerful.

E.g., Love of God Someone loves God God loves someone

Page 19: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Hierarchical Breaking up

Love of God Love of God

Noun Phrase

Prepositional Phrase

Head

DU

Modifier

Godoflove

Sub

love of God

Categorial Labeling Functional Labeling

Page 20: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Types of Generative Grammar

Finite State Model (sequential)

Phrase Structure Model (sequential + hierarchical) + (categorial)

Transformational Model (sequential + hierarchical + transformational) + (categorial + functional)

Page 21: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Phrase Structure Grammar (PSG)A phrase-structure grammar G consists of

a four tuple (V, T, S, P), where V is a finite set of alphabets (or

vocabulary) E.g., N, V, A, Adv, P, NP, VP, AP, AdvP, PP,

student, sing, etc. T is a finite set of terminal symbols: T

V E.g., student, sing, etc.

S is a distinguished non-terminal symbol, also called start symbol: S V

P is a set of productions.

Page 22: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Noun Phrases

John

NP

N

student

NP

N

the

Det

student

NP

N

the

Det

intelligent

AdjP

• John • the student • the intelligent student

Page 23: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Noun Phrase

five

NP

Quant

his

Det

first

Ord

students

N

PhD

N

• his first five PhD students

Page 24: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Noun Phrase

five

NP

Quant

the

Det

students

N

best

AP

of my class

PP

• The five best students of my class

Page 25: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Verb Phrases

sing

VP

V

can

Aux

the ball

VP

NP

can

Aux

hit

V

• can sing • can hit the ball

Page 26: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Verb Phrase

a flower

VP

NP

can

Aux

give

V

to Mary

PP

• Can give a flower to Mary

Page 27: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Verb Phrase

John

VP

NP

may

Aux

make

V

the chairman

NP

• may make John the chairman

Page 28: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Verb Phrase

the book

VP

NP

may

Aux

find

V

very interesting

AP

• may find the book very interesting

Page 29: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Prepositional Phrases• in the classroom

the river

PP

NP

near

P

the classroom

PP

NP

in

P

• near the river

Page 30: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Adjective Phrases

intelligent

AP

A

honest

AP

A

very

Degree

of sweets

AP

PP

fond

A

• intelligent • very honest • fond of sweets

Page 31: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Adjective Phrase• very worried that she might have done badly in the

assignment

that she might have done badly in the assignment

AP

S’

very

Degree

worried

A

Page 32: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Phrase Structure Rules

Rewrite Rules:(i) S NP VP(ii) NP Det N(iii) VP V NP(iv) Det the(v) N man, ball(v) V hit

We interpret each rule X Y as the instruction rewrite X as Y.

• The boy hit the ball.

Page 33: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Derivation

SentenceNP + VP (i)Det + N + VP (ii)Det + N + V + NP (iii)The + N + V + NP (iv)The + boy + V + NP (v)The + boy + hit + NP (vi)The + boy + hit + Det + N (ii)The + boy + hit + the + N (iv)The + boy + hit + the + ball (v)

• The boy hit the ball.

Page 34: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

PSG Parse Tree The boy hit the ball.

S

VPNP

VNDet

the

NP

Nboy Dethit

the ball

Page 35: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

PSG Parse Tree John wrote those words in the Book of

Proverbs.S

VPNP

VPropN NP

John wrote those words

PP

NP

in

P

the book

of proverbs

NP PP

Page 36: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Penn POS Tags

[John/NNP ]wrote/VBD [ those/DT words/NNS ]in/IN [ the/DT Book/NN ]of/IN [ Proverbs/NNS ]

• John wrote those words in the Book of Proverbs.

Page 37: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Penn Treebank

(S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in

(NP (NP-TTL (NP the Book)(PP of (NP Proverbs)))

• John wrote those words in the Book of Proverbs.

Page 38: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

PSG Parse Tree Official trading in the shares will start

in Paris on Nov 6. S

VP

NP

NAP

official

PP

trading will start on Nov 6

A

PP

NP

in

P

the shares

NP

PPVAux

in Paris

Page 39: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Penn POS Tags

[ Official/JJ trading/NN ]in/IN [ the/DT shares/NNS ]will/MD start/VB in/IN [ Paris/NNP ]on/IN [ Nov./NNP 6/CD ]

• Official trading in the shares will start in Paris on Nov 6.

Page 40: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Penn Treebank

( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will (VP start

(PP-LOC in (NP Paris))

(PP-TMP on (NP (NP Nov 6)

• Official trading in the shares will start in Paris on Nov 6.

Page 41: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Penn POS Tag Sset Adjective: JJ Adverb: RB Cardinal Number: CD Determiner: DT Preposition: IN Coordinating Conjunction CC Subordinating Conjunction: IN Singular Noun: NN Plural Noun: NNS Personal Pronoun: PP Proper Noun: NP Verb base form: VB Modal verb: MD Verb (3sg Pres): VBZ Wh-determiner: WDT Wh-pronoun: WP

Page 42: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Difference between constituency and dependency

Page 43: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Constituency Grammar Categorical Uses part of speech Context Free Grammar (CFG) Basic elements Phrases

Page 44: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Dependency Grammar Functional Context Free Grammar Basic elements Units of

Predication/ Modification/ Complementation/ Subordination/ Co-ordination

Page 45: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Bridge between Constituency and Dependency parse Constituency uses phrases Dependencies consist of Head-modifier combination This is a cricket bat.

Cricket (Category: N, Functional: Adj) Bat (Category: N, Functional: N)

For languages which are free word order we use dependency parser to uncover the relations between the words.

Raam ne Shaam ko dekha . (Ram saw Shyam) Shaam ko Ram ne dekha. (Ram saw Shyam) Case markers cling to the nouns they subordinate

Page 46: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Example of CG and DG output

Birds Fly.

S S

NP VP Subject

Predicate

N V

Birds

fly

Birds

fly

Page 47: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   21 st  March, 2011

Some probabilistic parsers and why they are used Stanford, Collins, Charniack, RASP Why Probabilistic parsers

For a single sentence we can have multiple parses.

Probability for the parse is calculated and then the parse with the highest probability is selected.

This is needed in many applications of NLP, that need parsing.