18
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Embed Size (px)

Citation preview

Page 1: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

School of somethingFACULTY OF OTHER

School of ComputingFACULTY OF ENGINEERING

Chunking: Shallow Parsing

Eric Atwell, Language Research Group

Page 2: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Shallow Parsing

Break text up into non-overlapping contiguous subsets of tokens.

• Also called chunking, partial parsing, light parsing.

What is it useful for? – semantic patterns

• Finding key “meaning-elements”: Named Entity Recognition

• people, locations, organizations

• Studying linguistic patterns, e.g. semantic patterns of verbs

• gave NP

• gave up NP in NP

• gave NP NP

• gave NP to NP

• Can ignore complex structure when not relevant

Page 3: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

A Relationship between Segmenting and Labeling

Tokenization segments the text

Tagging labels the text

Shallow parsing does both simultaneously.

Page 4: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Chunking vs. Full Syntactic Parsing

“G.K. Chesterton, author of The Man who was Thursday”

Page 5: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Representations for Chunks

IOB tags

• Inside, outside, and begin

• In English, the start of a phrase is often marked by a function-word

Page 6: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Representations for Chunks

Trees

• Chunk structure is a two-level tree that spans the entire text, containing both chunks and non-chunks

Page 7: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

CONLL Corpus: training data for Machine Learning of chunking

From the Conference on Natural Language Learning Competition from 2000

Goal: create machine learning methods to improve on the chunking task

Page 8: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

CONLL Corpus

Data in IOB format from WSJ Wall Street Journal:

• Word POS-tag IOB-tag

• Training set: 8936 sentences

• Test set: 2012 sentences

Tags from the Brill tagger

• Penn Treebank Tags

Evaluation measure: F-score

• 2*precision*recall / (recall+precision)

• Baseline was: select the chunk tag that is most frequently associated with the POS tag, F =77.07

• Best score in the contest was F=94.13

Page 9: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Chunking with Regular Expressions

This time we write regex’s over TAGS rather than characters

• <DT><JJ>?<NN>

• <NN.*>

• <JJ|NN>+

Compile them with parse.ChunkRule()

• rule = parse.ChunkRule(‘<DT|NN>+’)

• chunkparser = parse.RegexpChunk([rule], chunk_node = ‘NP’)

Resulting object is a (sort-of) parse tree

• Top-level node called S

• Chunks are labelled NP

Page 10: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Chunking with Regular Expressions

Page 11: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Chunking with Regular Expressions

Rule application is sensitive to order

Page 12: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Chinking

Specify what does not go into a chunk.

• Kind of like specifying punctuation as being not alphanumeric and spaces.

• Can be more difficult to think about.

Page 13: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Simple chink-chunk approach: function v content word-class

Regular expressions for chunks and chinks CAN get complex

BUT the whole point is to be simpler than full parsing!

SO: use a simple model which works “reasonably well”

(then tidy up afterwards…)

Chunk = nominal content-word (noun)

Chink = others (verb, pronoun, determiner, preposition, conjunction) (+adjective, adverb as a borderline category)

Page 14: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Example

Fruit flies like a banana

fruit\N flies\N like\V a\A banana\N

[fruit flies] like a [banana]

[S [NP fruit\N flies\N NP]

[VP like\V

[NP a\A banana\N NP]

VP]

S]

Page 15: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

An alternative parse

This sentence is grammatically ambiguous:

Fruit flies like a banana

fruit\N flies\N like\V a\A banana\N [fruit flies] like a [banana]

fruit\N flies\V like\I a\A banana\N [fruit] flies like a [banana]

cf: “bank robbers like a chase” v “bread bakes in an oven”

[S [NP fruit\N NP]

[VP flies\V

[PP like\I [NP a\A banana\N NP] PP]

VP]

S]

Page 16: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Ambiguity leads to more rules

fruit\N flies\N like\V a\A banana\N [fruit flies] like a [banana]

fruit\N flies\V like\I a\A banana\N [fruit] flies like a [banana]

BUT what about: Time flies like an arrow - time\N, time\V

time\N flies\N like\V an\A arrow\N [time flies] like an [arrow]

time\N flies\V like\I an\A arrow\N [time] flies like an [arrow]

time\V flies\N like\I an\A arrow\N time [flies] like an [arrow]

3rd PoS-tagging gives ambiguous parse

Page 17: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Chunking can predict prosodic breaks

http://www.acm.org/crossroads/

An Approach for Detecting Prosodic Phrase Boundaries in Spoken English by Claire Brierley and Eric Atwell

Page 18: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Summary

Shallow parsing is useful for:

Entity recognition

• people, locations, organizations

Studying linguistic patterns

• gave NP

• gave up NP in NP

• gave NP NP

• gave NP to NP

Prosodic phrase breaks – pauses in speech

Can ignore complex structure when not relevant

Chink-chunk approach: “quick-and-dirty” chunking, content v function PoS

Chink-chunk parsing is simpler than context-free grammar parsing!