7
Survey of Annotation Work Joint session Thursday afternoon, April 14 Chair: Eduard Hovy, ISI

Survey of Annotation Work Joint session Thursday afternoon, April 14 Chair: Eduard Hovy, ISI

  • View
    216

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Survey of Annotation Work Joint session Thursday afternoon, April 14 Chair: Eduard Hovy, ISI

Survey of Annotation Work

Joint session

Thursday afternoon, April 14

Chair: Eduard Hovy, ISI

Page 2: Survey of Annotation Work Joint session Thursday afternoon, April 14 Chair: Eduard Hovy, ISI

Phenomena (from OntoBank)

Level Who Phenomenon

L1 Penn Treebank bracketing/grouping of predications

L1 Propbank verb sense creation and annotation (including copula)

L1Propbank, Framenet, Verbnet, LCS, ILIT

verb sense frames & predicate structure (what labels?)

L1Propbank+Omega, IAMTC+Omega, ILIT, Scone

semantic term repository: conversion of senses to concepts(/clusters), axiom creation, insertion into ontology

L1,L2 NomBank, ACE noun senses, NP structure, propositions, (genitives, …)

L1 Gazetteers repository of instances (people, places, events…)

L1 BBN, (ACE) co-reference links (including events)

L2pronoun (and empty trace?) classification (ref, bound, event, generic, other)(proposition vs. event?)

L2 Propbank II, ILIT event identification

Page 3: Survey of Annotation Work Joint session Thursday afternoon, April 14 Chair: Eduard Hovy, ISI

Level Who Phenomenon

L1 direct quotation and reported speech

L1 simple quantifier phrases and numerical exprs

L1,L2 TimeBank, TIMEX,

ISI (Hobbs), ILIT

inter-predicate relations: temporal, spatial, manner, etc. (incl. effects from discourse and aspect)

L2+ WordNetPlus, Pantel, CYC entailments

L2+ comparatives

L2 coordination

L2/L3 Penn Discourse Treebank, RST Treebank, ILIT

discourse structure

L2/L3 U Pitt, ISI opinions

L3 identifying propositions and simple modality

L3/L4 other adverbials (epistemic modals, evidentials)

L3/L4 polarity (more advanced than plain “neg” in L1)

L3+ Steedman, Hajicova, Sgall information structure (theme/rheme), focus

L4 ILIT pragmatics/speech acts, style

L4 presuppositions

? CYC, Scone axioms and reasoning

? Framenet metaphor

Page 4: Survey of Annotation Work Joint session Thursday afternoon, April 14 Chair: Eduard Hovy, ISI

Notional goal

phenomenon annot annot functionality funder

speed reliability need

• noun senses 25 wph 86/90% IE,MT,QA... high

• verb senses 70 wph ~87% MT,QA,WSD high

• verb frames 80 w/week 87% MT,QA,IE… high

• time exprs 18 wpm 96% QA,IR,Summ med-hi

• discourse 100K in 400h ~90/80% Summ,QA med

• gazetteers ? ~95/90% QA,IE high

• opinions 100K in 400h ~76% QA,Summ med-hi

• number exprs ? ? IE,QA,Summ med

• hypotheticals ? ? QA,Summ low?

Page 5: Survey of Annotation Work Joint session Thursday afternoon, April 14 Chair: Eduard Hovy, ISI

Agenda I

• Predicate/verb level: – PropBank I and II: Martha Palmer, UPenn – OntoBank corefs: Lance Ramshaw, BBN – IAMTC consortium: Steve Helmreich, NMSU– FrameNet: Charles Fillmore, UC Berkeley – Extended LCS: Bonnie Dorr, U Maryland

• Nominal level: – NomBank: Adam Meyers, NYU – ACE: Ralph Grishman, NYU

• Terminology banks: – WordNet: Christiane Fellbaum, Princeton – Omega: Eduard Hovy, USC/ISI

to PropBank

to IAMTC

to OntoBank coref

to Framenet

to LCS

to NomBank and Pie-in-the-Sky

to ACE

to WordNetPlus

to Omega

Page 6: Survey of Annotation Work Joint session Thursday afternoon, April 14 Chair: Eduard Hovy, ISI

Agenda II

• Discourse level: – RST treebank: Lynn Carlson, DoD – Penn discourse treebank: Aravind Joshi, UPenn

• Specific semantic phenomena: – TIMEX: Lisa Ferro, MITRE & Beth Sundheim, SPAWAR – ILIT: Sergei Nirenburg, UMBC – Opinions: Jan Wiebe, U Pitt – Gazetteers: Beth Sundheim, SPAWAR

• Inference and reasoning:– WN Entailments: Christiane Fellbaum, Princeton– CYC: Dave Schneider– Scone: Scott Fahlman

to Penn discourse

to TIMEX

to ILIT

to opinions

to gazetteers

to WN entailments

to CYC

to Scone

to RST

Page 7: Survey of Annotation Work Joint session Thursday afternoon, April 14 Chair: Eduard Hovy, ISI

Summary of annot work

pheno-menon

who task accuracy speed 1 speed 2annotated corpus size

# individuals annotated

uses

pred-arg Propbank V frame annot ~85% 70/h 1M+ Eng 135K preds IE,QA,MT250K Chi

frame creation 100/weekIAMTC V/N senses .83/.66kappa 60/h 3Kw (10x) Eng MT

frame roles .52 kappaACE inter-N relations ~77% IE,QA,MTFrameNet 23Kw Eng 130K verbs in sentsvarious

entities Nombank N sense annot 86% 25/h 150K noun tokens IE,IR,QA,SummACE N types 90% 190Kw Eng

190Kw Chi190Kw Ara

event parts ~57%coref BBN coref 84–90% 100K in 60 h 300Kw Eng IE,Summ

apposition ~92%

ILIT Nirenburgnumerous phenomena

100K in 2080 h 2.5Kw Eng? MT,Summ,IE

gazetteer Sundheim link? Y/N 95% QA,IEsame gaz entry 87–99%

time exprs Ferro, value of expr 96% 100K in 93 h 350Kw Eng QA,IR,SummSundheim 220Kw Chi

discourse Joshi explicit 92% 100K in 450 h 16Kw Eng Summ,QAimplicit args 90% 8Kw Engimpl rel type 80% 8Kw Eng

opinions Wiebe et al. opinion frame ~76%-95% 100K in 400 h 15Ksent Eng QA,Summ