Upload
caroline-robertson
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Automatic Noun Compound Interpretation
Stephen TratzEduard Hovy
University of Southern California/Information Sciences Institute
Noun Compound Definition
A head noun with one or more preceding noun modifiers
Examples: uranium rod, mountain cave, terrorist attack
Other names
– noun-noun compound
– noun sequence
– compound nominal
– complex nominal
Problem 1: Relation between modifier and head nouns
– uranium rod (n1 substance-of n
2)
– mountain cave (n1 location-of n
2)
– terrorist attack (n1 performer-of n
2)
Problem 2: Structure of long noun compounds
– ((aluminum soup) (pot cover))
– ((aluminum (soup pot)) cover)
– (aluminum ((soup pot) cover))
The Problems
The Need
Needed for natural language understanding
– Question answering
– Recognition of textual entailment
– Summarization
– Summarization evaluation
– Etc
Solution
A taxonomy of relations that occur between nouns in noun compounds Wide coverage of relations Good definitions High inter-annotator agreement
An automatic classification method For supervised approaches, this requires a
sufficiently large annotated dataset
Weaknesses of earlier solutions
Limited-quality/usefulness relations Using unlimited number of relations Using a few ambiguous relations such as
prepositions and a handful of simple verbs (e.g., BE, HAVE, CAUSE) --- still need to disambiguate!
Definitions are sometimes not provided Low inter-annotator agreement
Limited annotated data (problem for supervised classification)
Dataset
Dataset of 17.5k noun compounds from:
– a large dataset extracted using mutual information plus part-of-speech tagging
– the WSJ portion of the Penn Treebank
Dataset Comparison
Size Work
17509 Tratz and Hovy, 2010
2169 Kim and Baldwin, 2005
2031* Girju, 2007
1660 Rosario and Hearst, 2001
1443 Ó Séaghdha and Copestake, 2007
505* Barker and Szpakowicz, 1998
600* Nastase and Szpakowicz, 2003
395 Vanderwende, 1994
385 Lauer, 1995
Our Semantic Relations
Relation taxonomy 43 relations Relatively fine-grained Defined using sentences Have rough mappings to relations used in well-
known noun compound research papers
Relation examples
Substance/Material/Ingredient Of (uranium rod)
– n1 is one of the primary physical substances/
materials/ingredients that n2 is made out of/from
Location Of (mountain cave)
– n1 is the location / geographic scope where n
2 is
at, near, from, generally found, or occurs.
All The Relations Communicator of Communication Performer of Act/Activity Creator/Provider/Cause Of Perform/Engage_In Create/Provide/Sell Obtain/Access/Seek Modify/Process/Change Mitigate/Oppose/Destroy Organize/Supervise/Authority Propel Protect/Conserve Transport/Transfer/Trade Traverse/Visit Possessor + Owned/Possessed Experiencer + Cognition/Mental Employer+Employee/Volunteer Consumer + Consumed User/Recipient + Used/Received Owned/Possessed + Possession Experiencer + Experiencer Thing Consumed + Consumer Thing/Means Used + User
Time [Span] + X X + Time [Span] Location/Geographic Scope of X Whole + Part/Member Of Substance/Material/Ingredient + Whole Part/Member + Collection/Configuration/Series X + Spatial Container/Location/Bounds Topic of Communication/Imagery/Info Topic of Plan/Deal/Arrangement/Rules Topic of Observation/Study/Evaluation Topic of Cognition/Emotion Topic of Expert Topic of Situation Topic of Event/Process Topic/Thing + Attribute Topic/Thing + Attribute Value Characteristic Of Coreferential Partial Attribute Transfer Measure + Whole Highly Lexicalized / Fixed Pair Other
Taxonomy Creation
Taxonomy created by Inspecting a large number of examples Comparing relations to those in literature Refining relations using Mechanical Turk Upload data for annotation Analyze annotations Make changes Repeat (5x)
Inter-annotator agreement study
What:
– Calculate the level of agreement between two or more sets of annotations
Why:
– Human agreement typically represents an upper bound on machine performance
How:
– Used Amazon's Mechanical Turk service to collect a set of annotations
Reasons for using Turk
• Inexpensive
– Paid $0.08 per decision
• Low startup time
– Don't have to wait long for people to start working
• Relatively fast turnaround
Problems with using Turk
• Mixed annotator quality
• No training for the annotators
• No guarantee of native English speakers
• Different number of annotations per Turker
– Can't force someone to annotate everything
– Problem for 3+ annotator agreement formula (Fleiss' Kappa)
Solution: Combine Turkers
• Requested 10 annotations per compound
• Calculated a weight for each Turker based upon his/her level of agreement with other Turkers
– Average percentage of annotations that agreed with the Turker
• Used weights to created a single set of annotations
• Ignored Turkers who performed less than 3 annotations
Agreement Scores
• Calculated raw agreement
– # agreements / # decisions
• Cohen's Kappa
– Adjusts for chance agreement
Id Agree%
κ κ* κ**
Combined 0.59 0.57 0.61 0.67
Auto 0.51 0.47 0.47 0.45
Agreement Results
Individual Turker Agreement(vs author, N >= 15)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Auto
Combined
Turker vote weight vs Agreement
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Turker vote weight
Ag
ree
me
nt
vs a
uth
or
•Correlation for Turkers who performed 15 or more vs Agreement: 0.92
Comparison to other studies
Automatic Classification Method
Maximum Entropy classifier
– SVMmulticlass
gave similar performance after optimizing the C parameter
Large number of boolean features extracted for each word from
WordNet Roget's Thesaurus Web 1T Corpus the spelling of the words
Features Used
Synonyms Hypernyms Definition words Lexicographer Ids Link types (e.g., part-of) List of different types of
part-of-speech entries All parts-of Prefixes, suffixes
Roget's division information
Last letters of the word Trigrams and 4-grams
from Web 1T corpus Some combinations of
features (e.g. shared hypernyms)
A handful of others
Cross-validation experiments
Performed one-feature-type-only and all-but-one experiments for the different types of features
Most useful features
– Hypernyms
– Definition words
– Synonyms
– Web 1T trigrams and 4-grams
Conclusion
Novel taxonomy 43 fine-grained relations Defined using sentences with placeholders for the
nouns Achieved relatively high inter-annotator agreement
given the difficulty of the task Largest annotated dataset
Over 8 times larger than the next largest Automatic classification method
Achieves performance approximately .10 less than human inter-annotator agreement
Future Work
Address structural issues of longer (3+ word) compounds
Merge relation set with The Preposition Project (Litkowski, 2002) relations for prepositions
Integrate into a dependency parser
The End
Thank you for listening Questions?