Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Broad-Coverage Transition-Based UCCA Parsing
Daniel Hershcovich1,2 & Omri Abend2 & Ari Rappoport2
1Edmond and Lily Safra Center for Brain Sciences2School of Computer Science and Engineering
Hebrew University of Jerusalem
Learning ClubNovember 17, 2016
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 1 / 26
Semantic Annotation Schemes
Outline
1 Semantic Annotation Schemes
2 Transition-based UCCA Parsing
3 Experiments
4 Future Work
5 Conclusions
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 2 / 26
Semantic Annotation Schemes
Semantic Annotation Schemes
Represent semantic structure of text as a graph.Used by NLP applications for features and structure, providing informationsuch as who did what to whom?
Examples:Semantic Role LabelingSemantic DependenciesAbstract Meaning RepresentationUniversal Conceptual Cognitive Annotation
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 3 / 26
Semantic Annotation Schemes
Semantic Role Labeling (SRL)
Annotate predicates and their arguments as a flat structure. Examples:
PropBank
After graduation , John moved to Paris
move.01AM-TMP A1 A2
MotionTime Theme Goal
FrameNet
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 4 / 26
Semantic Annotation Schemes
Semantic Dependency Parsing (SDP)
Graph on the text tokens, including internal structure of arguments.Examples:
DELPH-IN MRS-derived bi-lexical dependencies (DM)
After graduation , John moved to Paris
top
ARG2
ARG1
ARG1 ARG1 ARG2
top
TWHEN
ACT-argDIR3-arg
Prague Dependency Treebank tectogrammatical layer (PSD)Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 5 / 26
Semantic Annotation Schemes
Abstract Meaning Representation (AMR)
Graph on knowledge resource entries inferred from the tokens.
name
ARG0ARG2 time
name
op1
op1
op1
Paris John
person
move-01
city
name
after
name graduate-01
ARG0
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 6 / 26
Semantic Annotation Schemes
Universal Conceptual Cognitive Annotation (UCCA)
Cross-linguistically applicable semantic representation scheme.Builds on typological [Dixon, 2012] and Cognitive Linguisticsliterature [Croft and Cruse, 2004].Demonstrated applicability to English, French, German & Czech.Support for rapid annotation.Semantic stability in translation [Sulem et al., 2015].Proven useful for machine translation evaluation [Birch et al., 2016].Applicability has been so far limited by the absence of a parser.
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 7 / 26
Semantic Annotation Schemes
Structural Properties
(1) non-terminal nodes, (2) reentrancy, (3) discontinuity
John
C
and
N
Mary
C
A
wentP
home
A
John
A
gaveC
everything upC
P
A
John
A
decided
P
toF
take
C
aF
quick shower
C
P
A
A
D
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 8 / 26
Semantic Annotation Schemes
UCCA Corpora
Wiki 20KTrain Dev Test Leagues
# passages 300 34 34 154# sentences 4267 453 518 506# nodes 298,665 33,263 37,262 29,315% terminal 42.95 43.62 42.89 42.09% non-term. 58.30 57.46 58.31 60.01% discont. 0.53 0.51 0.47 0.81% reentrant 2.31 1.76 2.18 2.03# edges 287,381 32,015 35,846 27,749% primary 98.29 98.81 98.75 97.73% remote 1.71 1.19 1.25 2.27Average per non-terminal node# children 1.67 1.68 1.66 1.61
a
Excluding root node, implicit nodes, andlinkage nodes and edges.
aCorpora are available at http://www.cs.huji.ac.il/˜oabend/ucca.html
P processS stateA participantL linkerH linked sceneC centerE elaboratorD adverbialR relatorN connectorU punctuationF function unitG ground
Table: Edge labels.
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 9 / 26
Transition-based UCCA Parsing
Outline
1 Semantic Annotation Schemes
2 Transition-based UCCA Parsing
3 Experiments
4 Future Work
5 Conclusions
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 10 / 26
Transition-based UCCA Parsing
Transition-Based Parsing
Parse sentence w1 . . . wn to graph G = (V , E , `) incrementally, using bufferB and stack S. Classifier determines transition to apply at each step.Transition-based parsers work by applying a transition at each step to theparser state, defined using a buffer B of tokens and nodes to be processed,a stack S of nodes currently being processed, and a graph G = (V , E , `)of constructed nodes and edges. A classifier selects the next transitionbased on the current state’s features. It is trained by an oracle based ongold-standard annotations.
S After graduation B John moved to Paris
G
AfterL
graduationP
H
Transitions for UCCA parsing:Shift, Reduce, NodeX ,Left-EdgeX , Right-EdgeX , Left-RemoteX , Right-RemoteX ,Swap, Finish
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 11 / 26
Transition-based UCCA Parsing
Transition-Based Parsing
S
John
B
moved to Paris
G
After
L
graduation
P
H
Transitions: Shift, Right-EdgeL, Reduce, Shift,
NodeP, Reduce, Shift, Right-EdgeH, Shift
Figure: Example for intermediate state during transition-based parsing.
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 12 / 26
Transition-based UCCA Parsing
TUPA (Transition-Based UCCA Parser)
Our parser supports the structural properties of UCCA.1
Before Transition After TransitionStack Buffer Nodes Edges Stack Buffer Nodes EdgesS x | B V E Shift S | x B V ES | x B V E Reduce S B V ES | x B V E NodeX S | x y | B V ∪ {y} E ∪ {(y , x)X }S | y , x B V E Left-EdgeX S | y , x B V E ∪ {(x , y)X }S | x , y B V E Right-EdgeX S | x , y B V E ∪ {(x , y)X }S | y , x B V E Left-RemoteX S | y , x B V E ∪ {(x , y)∗
X }S | x , y B V E Right-RemoteX S | x , y B V E ∪ {(x , y)∗
X }S | x , y B V E Swap S | y x | B V E[root] ∅ V E Finish ∅ ∅ V E
Table: TUPA transitions. (·, ·)X denotes a primary X -labeled edge, and (·, ·)∗X a
remote X -labeled edge.1Parser code available at https://github.com/danielhers/ucca.Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 13 / 26
Transition-based UCCA Parsing
2
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 14 / 26
Transition-based UCCA Parsing
TUPA Classifiers
We experiment with three classifiers:
TUPAsparse Perceptron, sparse features: words, POS tags & edge label combinations.TUPAdense Perceptron, dense embedding features: word2vec [Mikolov et al., 2013] for words, else random.TUPANN 2-layer MLP, learned embedding features, logistic activation + dropout.
For all classifiers, inference is performed greedily, i.e., without beam search.
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 14 / 26
Experiments
Outline
1 Semantic Annotation Schemes
2 Transition-based UCCA Parsing
3 Experiments
4 Future Work
5 Conclusions
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 15 / 26
Experiments
Experimental Setup
We conduct our main experiment on the UCCA Wikipedia corpus, and usethe English part of the UCCA Twenty Thousand Leagues Under the SeaEnglish-French parallel corpus as out-of-domain data.
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 16 / 26
Experiments
Evaluation
We report two variants of labeled precision, recall and F-score: one wherewe consider only primary edges, and another for remote edges. Givengraphs Gp = (Vp, Ep, `p) and Gg = (Vg , Eg , `g) over terminalsW = {w1, . . . , wn}, the yield y(e) ⊆ W of an edge e = (u, v) in eithergraph is the set of terminals in W that are descendants of v . The mutualedges between the graphs are:
M(Gp, Gg) = {(e1, e2) ∈ Ep × Eg | y(e1) = y(e2) ∧ `p(e1) = `g(e2)}
and we defineLP = |M(Gp, Gg)|/|Ep| LR = |M(Gp, Gg)|/|Eg | LF = 2 · LP · LR/(LP + LR).
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 17 / 26
Experiments
Baselines
After graduation , John moved to Paris
L UA
A
H
RA
John gave everything up
A A
C
John and Mary went home
A
NC
A
Figure: Bilexical approximation for UCCA graphs.
Sincetherearenoex-ist-ingUCCAparsers,weusebilexical DAG parsers:
1 Convert UCCA into bilexical dependencies.2 Train parsers on the resulting training set.3 Apply trained parsers to the test set.4 Reconstruct UCCA graphs.5 Compare with gold standard.
Upper bounds are computed by applying the conversion both ways to thegold standard graphs and comparing to the original.
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 18 / 26
Experiments
Results
TUPANN obtains the highest scores in nearly all metrics:
Wiki (in-domain) 20K Leagues (out-of-domain)Primary Remote Primary Remote
LP LR LF LP LR LF LP LR LF LP LR LFBilexical Approximation
Upper Bound 93.4 83.7 88.3 73.9 49.5 59.3 93.5 83.5 88.2 66.7 31.6 42.9DAGParser [Ribeyre et al., 2014] 63.7 56.1 59.5 0.8 9.5 1.4 58 49.8 53.4 – 0 0TurboParser [Almeida and Martins, 2015] 60.2 47.4 52.9 2.2 7.8 3.4 52.6 39 44.7 100 0.3 0.6Direct Approach
TUPAsparse 64 55.6 59.5 16 11.6 13.4 60.6 53.9 57.1 20.2 10.3 13.6TUPAdense 55 54.8 54.9 15.2 16.9 16 54.8 55.2 55 6 3 4TUPANN 65 62.5 63.7 20.7 11.3 14.6 58.3 56.4 57.3 15.2 3.8 6
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 19 / 26
Experiments
Tree Approximation
For completeness, we also explore lossily converting UCCA into trees,resulting in a simplified task for the underlying parser, in addition to thematurity of tree-based parsers.Although remote edges are of pivotal importance, exploring treeapproximation methods can inform the future development of DAG parsersin general and of UCCA parsers in particular.
Constituency Tree ApproximationUpper Bound 100 100 100uparse [Maier and Lichte, 2016] 63 64.7 63.7Dependency Tree Approximation
Upper Bound 93.7 83.6 88.4MaltParser [Nivre et al., 2007] 64.9 57.9 61LSTM Parser [Dyer et al., 2015] 74.9 66.4 70.2Direct Tree Parsing
TUPAsparse − Remote 65.5 57.5 61.3TUPAdense − Remote 57.2 57.3 57.2TUPANN − Remote 66.3 64.4 65.3
Theper-for-manceisen-cour-ag-ingin
light of UCCA’s inter-annotator agreement of 80–85% F-score on primaryedges [Abend and Rappoport, 2013].
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 20 / 26
Future Work
Outline
1 Semantic Annotation Schemes
2 Transition-based UCCA Parsing
3 Experiments
4 Future Work
5 Conclusions
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 21 / 26
Future Work
UCCA-Based Distributed Representation
Vector representation for sentences and documents, based on recursivecomposition on the UCCA graph.Impact:
General automatic semantic feature extractor for text.Accurate measure for text similarity.Understand the semantic contribution of different elements.
AfterL
graduationP
H,
U
JohnA
movedP
toR
ParisC
A
H
A
LR
LA
LA
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 22 / 26
Conclusions
Outline
1 Semantic Annotation Schemes
2 Transition-based UCCA Parsing
3 Experiments
4 Future Work
5 Conclusions
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 23 / 26
Conclusions
Conclusions
We present TUPA, the first parser for UCCA, and evaluate it in bothin-domain and out-of-domain settings, showing it surpasses bilexical DAGparsers on the task of UCCA parsing.Future work will incorporate LSTMs into TUPA, and apply the parser tomore languages such as German, demonstrating the importance ofbroad-coverage parsing. We will also improve the conversion-basedmethods and explore different target representations. A UCCA parser willenable using the scheme for representation in NLP tasks.
UCCA exhibits formal properties important for semanticrepresentation.We present the first parser for UCCA and the first to support theseproperties.
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 24 / 26
Conclusions
References I
Abend, O. and Rappoport, A. (2013).Universal Conceptual Cognitive Annotation (UCCA).In Proc. of ACL, pages 228–238.
Almeida, M. S. C. and Martins, A. F. T. (2015).Lisbon: Evaluating TurboSemanticParser on multiple languages and out-of-domain data.In Proc. of SemEval, pages 970–973, Denver, Colorado. Association for Computational Linguistics.
Birch, A., Haddow, B., Bojar, O., and Abend, O. (2016).Hume: Human UCCA-based evaluation of machine translation.arXiv preprint arXiv:1607.00030.
Croft, W. and Cruse, D. A. (2004).Cognitive linguistics.Cambridge University Press.
Dixon, R. M. W. (2010/2012).Basic linguistic theory.
Dyer, C., Ballesteros, M., Ling, W., Matthews, A., and Smith, N. A. (2015).Transition-based dependeny parsing with stack long short-term memory.In Proc. of ACL, pages 334–343.
Maier, W. and Lichte, T. (2016).Discontinuous parsing with continuous trees.In Proc. of Workshop on Discontinuous Structures in NLP, pages 47–57, San Diego, California. Association forComputational Linguistics.
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 25 / 26
Conclusions
References II
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).Efficient estimation of word representations in vector space.arXiv preprint arXiv:1301.3781.
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kubler, S., Marinov, S., and Marsi, E. (2007).MaltParser: A language-independent system for data-driven dependency parsing.Natural Language Engineering, 13(02):95–135.
Ribeyre, C., Villemonte de la Clergerie, E., and Seddah, D. (2014).Alpage: Transition-based semantic graph parsing with syntactic features.In Proc. of SemEval, pages 97–103, Dublin, Ireland.
Sulem, E., Abend, O., and Rappoport, A. (2015).Conceptual annotations preserve structure across translations: A French-English case study.In Proc. of S2MT, pages 11–22.
Daniel Hershcovich Broad-Coverage Transition-Based UCCA Parsing 26 / 26