Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute

Automatic Noun Compound Interpretation

Stephen TratzEduard Hovy

University of Southern California/Information Sciences Institute

Noun Compound Definition

A head noun with one or more preceding noun modifiers

Examples: uranium rod, mountain cave, terrorist attack

Other names

– noun-noun compound

– noun sequence

– compound nominal

– complex nominal

Problem 1: Relation between modifier and head nouns

– uranium rod (n1 substance-of n

2)

– mountain cave (n1 location-of n

2)

– terrorist attack (n1 performer-of n

2)

Problem 2: Structure of long noun compounds

– ((aluminum soup) (pot cover))

– ((aluminum (soup pot)) cover)

– (aluminum ((soup pot) cover))

The Problems

The Need

Needed for natural language understanding

– Question answering

– Recognition of textual entailment

– Summarization

– Summarization evaluation

– Etc

Solution

A taxonomy of relations that occur between nouns in noun compounds Wide coverage of relations Good definitions High inter-annotator agreement

An automatic classification method For supervised approaches, this requires a

sufficiently large annotated dataset

Weaknesses of earlier solutions

Limited-quality/usefulness relations Using unlimited number of relations Using a few ambiguous relations such as

prepositions and a handful of simple verbs (e.g., BE, HAVE, CAUSE) --- still need to disambiguate!

Definitions are sometimes not provided Low inter-annotator agreement

Limited annotated data (problem for supervised classification)

Dataset

Dataset of 17.5k noun compounds from:

– a large dataset extracted using mutual information plus part-of-speech tagging

– the WSJ portion of the Penn Treebank

Dataset Comparison

Size Work

17509 Tratz and Hovy, 2010

2169 Kim and Baldwin, 2005

2031* Girju, 2007

1660 Rosario and Hearst, 2001

1443 Ó Séaghdha and Copestake, 2007

505* Barker and Szpakowicz, 1998

600* Nastase and Szpakowicz, 2003

395 Vanderwende, 1994

385 Lauer, 1995

Our Semantic Relations

Relation taxonomy 43 relations Relatively fine-grained Defined using sentences Have rough mappings to relations used in well-

known noun compound research papers

Relation examples

Substance/Material/Ingredient Of (uranium rod)

– n1 is one of the primary physical substances/

materials/ingredients that n2 is made out of/from

Location Of (mountain cave)

– n1 is the location / geographic scope where n

2 is

at, near, from, generally found, or occurs.

All The Relations Communicator of Communication Performer of Act/Activity Creator/Provider/Cause Of Perform/Engage_In Create/Provide/Sell Obtain/Access/Seek Modify/Process/Change Mitigate/Oppose/Destroy Organize/Supervise/Authority Propel Protect/Conserve Transport/Transfer/Trade Traverse/Visit Possessor + Owned/Possessed Experiencer + Cognition/Mental Employer+Employee/Volunteer Consumer + Consumed User/Recipient + Used/Received Owned/Possessed + Possession Experiencer + Experiencer Thing Consumed + Consumer Thing/Means Used + User

Time [Span] + X X + Time [Span] Location/Geographic Scope of X Whole + Part/Member Of Substance/Material/Ingredient + Whole Part/Member + Collection/Configuration/Series X + Spatial Container/Location/Bounds Topic of Communication/Imagery/Info Topic of Plan/Deal/Arrangement/Rules Topic of Observation/Study/Evaluation Topic of Cognition/Emotion Topic of Expert Topic of Situation Topic of Event/Process Topic/Thing + Attribute Topic/Thing + Attribute Value Characteristic Of Coreferential Partial Attribute Transfer Measure + Whole Highly Lexicalized / Fixed Pair Other

Taxonomy Creation

Taxonomy created by Inspecting a large number of examples Comparing relations to those in literature Refining relations using Mechanical Turk Upload data for annotation Analyze annotations Make changes Repeat (5x)

Inter-annotator agreement study

What:

– Calculate the level of agreement between two or more sets of annotations

Why:

– Human agreement typically represents an upper bound on machine performance

How:

– Used Amazon's Mechanical Turk service to collect a set of annotations

Reasons for using Turk

• Inexpensive

– Paid $0.08 per decision

• Low startup time

– Don't have to wait long for people to start working

• Relatively fast turnaround

Problems with using Turk

• Mixed annotator quality

• No training for the annotators

• No guarantee of native English speakers

• Different number of annotations per Turker

– Can't force someone to annotate everything

– Problem for 3+ annotator agreement formula (Fleiss' Kappa)

Solution: Combine Turkers

• Requested 10 annotations per compound

• Calculated a weight for each Turker based upon his/her level of agreement with other Turkers

– Average percentage of annotations that agreed with the Turker

• Used weights to created a single set of annotations

• Ignored Turkers who performed less than 3 annotations

Agreement Scores

• Calculated raw agreement

– # agreements / # decisions

• Cohen's Kappa

– Adjusts for chance agreement

Id Agree%

κ κ* κ**

Combined 0.59 0.57 0.61 0.67

Auto 0.51 0.47 0.47 0.45

Agreement Results

Individual Turker Agreement(vs author, N >= 15)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Auto

Combined

Turker vote weight vs Agreement

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Turker vote weight

Ag

ree

me

nt

vs a

uth

or

•Correlation for Turkers who performed 15 or more vs Agreement: 0.92

Comparison to other studies

Automatic Classification Method

Maximum Entropy classifier

– SVMmulticlass

gave similar performance after optimizing the C parameter

Large number of boolean features extracted for each word from

WordNet Roget's Thesaurus Web 1T Corpus the spelling of the words

Features Used

Synonyms Hypernyms Definition words Lexicographer Ids Link types (e.g., part-of) List of different types of

part-of-speech entries All parts-of Prefixes, suffixes

Roget's division information

Last letters of the word Trigrams and 4-grams

from Web 1T corpus Some combinations of

features (e.g. shared hypernyms)

A handful of others

Cross-validation experiments

Performed one-feature-type-only and all-but-one experiments for the different types of features

Most useful features

– Hypernyms

– Definition words

– Synonyms

– Web 1T trigrams and 4-grams

Conclusion

Novel taxonomy 43 fine-grained relations Defined using sentences with placeholders for the

nouns Achieved relatively high inter-annotator agreement

given the difficulty of the task Largest annotated dataset

Over 8 times larger than the next largest Automatic classification method

Achieves performance approximately .10 less than human inter-annotator agreement

Future Work

Address structural issues of longer (3+ word) compounds

Merge relation set with The Preposition Project (Litkowski, 2002) relations for prepositions

Integrate into a dependency parser

The End

Thank you for listening Questions?

Documents

Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute