Retrieving Correct Semantic Boundaries in Dependency Structure
Jinho D. Choi (University of Colorado at Boulder)Martha Palmer (University of Colorado at Boulder)
The 4th Linguistic Annotation Workshop at ACL’10July 15th, 2010
Dependency Structure for SRL• What is dependency?
- Syntactic or semantic relation between a pair of words.
• Why dependency structure for semantic role labeling?
- Dependency relations often correlate with semantic roles.
- Simpler structure
2
LOC PMODNMOD
places in this city
TMP
events year
→ faster annotation → more gold-standard
→ more applicationsfaster parsing
Dep (Choi) vs. Phrase (Charniak) → 0.0025 vs. 0.5 (sec)
Phrase vs. Dependency Structure• Constituent vs. Dependency
3
appear
results
The
in
news
today
's
SBJ LOC
NMOD PMOD
NMOD
NMOD
10/15 (66.67%) parsing papers at ACL’10are on Dependency Parsing
-SBJ
-LOC
PropBank in Phrase Structure• A corpus annotated with verbal propositions and arguments.
• Arguments are annotated on phrases.
4
But there is no phrasein dependency structure
ARG0
ARGM-LOC
PropBank in Dependency Structure• Arguments are annotated on head words instead.
5
The results appear in today 's newsroot
NMOD SBJ LOC NMODNMOD
ROOT PMOD
Phrase = Subtree of head-word
ARG0
ARGM-LOC
Propbank in Dependency Structure• Phase ≠ Subtree of head-word.
6
The plant owned by Mark
NMOD NMOD LGS PMOD
Subtree of the head word includes the predicate
ARG1
Tasks• Tasks
- Convert phrase structure (PS) to dependency structure (DS).
- Find correct head words in DS.
- Retrieve correct semantic boundaries from DS.
• Conversion
- Pennconverter, by Richard Johansson
• Used for CoNLL 2007 - 2009.
- Penn Treebank (Wall Street Journal)
• 49,208 trees were converted.
• 292,073 Propbank arguments exist.
7
System Overview
8
Penn Treebank PropBank
Automatic SRL System
Set of Head words
Dependency trees
Pennconverter
Head words
Heuristics
Set of chunks (phrases)
Heuristics
Finding correct head words• Get the word-set Sp of
each argument in PS.
• For each word in Sp, find the word wmax with the maximum subtree in DS.
• Add the word to the head-list Sd.
• Remove the subtree of wmax from Sp.
• Repeat the search until Sp becomes empty.
9
Yields on mutual toroot
NMOD OPRDNMODPMOD
ROOTSBJ
funds continued slide
IM
}
Sp = { }
Sd = [Yields ], to
Yields, on, mutual, funds, to, slide
Retrieving correct semantic boundaries
• Retrieving the subtrees of head-words
- 100% recall, 87.62% precision, 96.11% F1-score.
- What does this mean?
• The state-of-art SRL system using DS performs about 86%.
• If your application requires actual argument phrases instead of head-words, the performance becomes lower than 86%.
• Improve the precision by applying heuristics on:
- Modals, negations
- Verb chain, relative clauses
- Gerunds, past-participles
10
Verb Predicates whose Semantic Arguments are their Syntactic Heads
• Semantic arguments of verb predicates can be the syntactic heads of the verbs.
• General solution
- For each head word, retrieve the subtree of the head word excluding the subtree of the verb predicate.
11
The plant owned by Mark
NMOD NMOD LGS PMOD
Examples• Modals are the heads of the main verbs in DS.
• Conjunctions
• Past-participles
12
He may or read the bookroot
SBJ COORD ADV NMODOBJROOT
may not
CONJCOORD
people who meet exceed
NMOD
or the
DEP NMODOBJ
expectation
CONJCOORD
correspondence mailed about
NMODNMOD
incomplete 8300s
NMODPMOD
Evaluations• Models
- Model I : retrieving all words in the subtrees (baseline).
- Model II : using all heuristics.
- Model III : II + excluding punctuation.
• Measurements
- Accuracy : exact match
- Precision
- Recall
- F1-score
13
• Results
- Baseline : 88.00%a, 92.51%p, 100%r , 96.11%f
- Final model : 98.20%a, 99.14%p, 99.95%r, 99.54%f
• Statistically significant (t = 149, p < .0001)
Evaluations
14
88
91
94
97
100
I II III
AccuracyPrecisionRecallF1
• Overlapping arguments
Error Analysis
15
ARG1 ARGM-LOC
inshare
LOC
burdens the region
NMODPMOD
OBJ
ARG1
inshare burdens the region
NMODPMOD
OBJLOC
the investors showed forenthusiasm stocks
NMODNMOD
SBJ PMOD
ADV
Error Analysis• PP attachment
16
the investors showed forenthusiasm stocks
NMODNMOD
SBJ ADV PMOD
ARG1
ARG1
Conclusion• Conclusion
- Find correct head words (min-set with max-coverage).
- Find correct semantic boundaries (99.54% F1-score).
- Suggest ways of reconstructing dependency structure so that it can fit better with semantic roles.
- Can be used to fix some of the inconsistencies in both Treebank and Propbank annotations.
• Future work
- Apply to different corpora.
- Find ways of automatically adding empty categories.
17
Acknowledgements• Special thanks are due to Professor Joakim Nivre of
Uppsala University (Sweden) for his helpful insights.
• National Science Foundation CISE-CRI-0551615
• Towards a Comprehensive Linguistic Annotation and CISE-CRI 0709167
• Collaborative: A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu
• Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc.
18