Upload
lamkhanh
View
223
Download
3
Embed Size (px)
Citation preview
Recent Advances in Dependency Parsing
Tutorial, EACL, April 27th, 2014
Ryan McDonald1 Joakim Nivre2
1Google Inc., USA/UKE-mail: [email protected]
2Uppsala University, SwedenE-mail: [email protected]
Recent Advances in Dependency Parsing 1(42)
Introduction
Introduction
I Pre-2008I What we will briefly cover in first quarter of tutorialI Textbook: Dependency Parsing [Kubler et al. 2009]I ESSLLI 2007:
http://www.ryanmcd.com/courses/esslli2007/I ACL 2006:
http://stp.lingfil.uu.se/~nivre/docs/ACLslides.pdf
I Post-2008I What we will mainly cover todayI NAACL 2010:
http://naaclhlt2010.isi.edu/tutorials/t7.html
Recent Advances in Dependency Parsing 2(42)
Introduction
Overview of the Tutorial
I Introduction to Dependency Parsing (Joakim)I Formulation, definitions, evaluation, etc.I Graph-based and transition-based parsingI Contrastive error analysis
I Graph-based parsing post-2008 (Ryan)
I Transition-based parsing post-2008 (Joakim)
I Summary and final thoughts (Ryan)
Recent Advances in Dependency Parsing 3(42)
Introduction
Introduction: Outline
I Dependency syntax:I Basic conceptsI Terminology and notationI Dependency graphsI Data-driven dependency parsing
I Paradigms:I Graph-based parsingI Transition-based parsingI Alternatives (brief)
I Contrastive error analysis [McDonald and Nivre 2007]
Recent Advances in Dependency Parsing 4(42)
Dependency Syntax
Dependency Syntax
I The basic idea:I Syntactic structure consists of lexical items, linked by binary
asymmetric relations called dependencies.
I In the words of Lucien Tesniere [Tesniere 1959]:I La phrase est un ensemble organise dont les elements constituants
sont les mots. [1.2] Tout mot qui fait partie d’une phrase cesse par
lui-meme d’etre isole comme dans le dictionnaire. Entre lui et ses
voisins, l’esprit apercoit des connexions, dont l’ensemble forme la
charpente de la phrase. [1.3] Les connexions structurales etablissent
entre les mots des rapports de dependance. Chaque connexion unit
en principe un terme superieur a un terme inferieur. [2.1] Le terme
superieur recoit le nom de regissant. Le terme inferieur recoit le
nom de subordonne. Ainsi dans la phrase Alfred parle [. . . ], parle
est le regissant et Alfred le subordonne. [2.2]
Recent Advances in Dependency Parsing 5(42)
Dependency Syntax
Dependency Syntax
I The basic idea:I Syntactic structure consists of lexical items, linked by binary
asymmetric relations called dependencies.
I In the words of Lucien Tesniere [Tesniere 1959]:I The sentence is an organized whole, the constituent elements of
which are words. [1.2] Every word that belongs to a sentence ceases
by itself to be isolated as in the dictionary. Between the word and
its neighbors, the mind perceives connections, the totality of which
forms the structure of the sentence. [1.3] The structural
connections establish dependency relations between the words. Each
connection in principle unites a superior term and an inferior term.
[2.1] The superior term receives the name governor. The inferior
term receives the name subordinate. Thus, in the sentence Alfred
parle [. . . ], parle is the governor and Alfred the subordinate. [2.2]
Recent Advances in Dependency Parsing 5(42)
Dependency Syntax
Dependency Structure
Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
Recent Advances in Dependency Parsing 6(42)
Dependency Syntax
Dependency Structure
Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
Recent Advances in Dependency Parsing 6(42)
Dependency Syntax
Dependency Structure
Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
Recent Advances in Dependency Parsing 6(42)
Dependency Syntax
Dependency Structure
Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
Recent Advances in Dependency Parsing 6(42)
Dependency Syntax
Dependency Structure
Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
Recent Advances in Dependency Parsing 6(42)
Dependency Syntax
Dependency Structure
Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
Recent Advances in Dependency Parsing 6(42)
Dependency Syntax
Dependency Structure
Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubjamod
dobj
amod prep
pobj
amod
p
Recent Advances in Dependency Parsing 6(42)
Dependency Syntax
Terminology
Superior InferiorHead DependentGovernor ModifierRegent Subordinate...
...
Recent Advances in Dependency Parsing 7(42)
Dependency Syntax
Terminology
Superior InferiorHead DependentGovernor ModifierRegent Subordinate...
...
Recent Advances in Dependency Parsing 7(42)
Dependency Syntax
Phrase Structure
JJ
Economic
��
NN
news
HH
������������
NP
VBD
had
�������VP
S
JJ
little
��
NN
effect
HH
"""""
HH
NP
NP
IN
on
���
HH
PP
JJ
financial
��
NNS
markets
HH
HH
NP PU
.
QQQ
QQQ
QQQQ
Recent Advances in Dependency Parsing 8(42)
Dependency Syntax
Comparison
I Dependency structures explicitly representI head-dependent relations (directed arcs),I functional categories (arc labels),I possibly some structural categories (parts-of-speech).
I Phrase structures explicitly representI phrases (nonterminal nodes),I structural categories (nonterminal labels),I possibly some functional categories (grammatical functions).
I Hybrid representations may combine all elements.
Recent Advances in Dependency Parsing 9(42)
Dependency Syntax
Some Theoretical Frameworks
I Word Grammar (WG) [Hudson 1984, Hudson 1990, Hudson 2007]
I Functional Generative Description (FGD) [Sgall et al. 1986]
I Dependency Unification Grammar (DUG)[Hellwig 1986, Hellwig 2003]
I Meaning-Text Theory (MTT) [Mel’cuk 1988, Milicevic 2006]
I (Weighted) Constraint Dependency Grammar ([W]CDG)[Maruyama 1990, Menzel and Schroder 1998, Schroder 2002]
I Functional Dependency Grammar (FDG)[Tapanainen and Jarvinen 1997, Jarvinen and Tapanainen 1998]
I Topological/Extensible Dependency Grammar ([T/X]DG)[Duchier and Debusmann 2001, Debusmann et al. 2004]
Recent Advances in Dependency Parsing 10(42)
Dependency Syntax
Some Theoretical Issues
I Dependency structure sufficient as well as necessary?
I Mono-stratal or multi-stratal syntactic representations?I What is the nature of lexical elements (nodes)?
I Morphemes?I Word forms?I Multi-word units?
I What is the nature of dependency types (arc labels)?I Grammatical functions?I Semantic roles?
I What are the criteria for identifying heads and dependents?
I What are the formal properties of dependency structures?
Recent Advances in Dependency Parsing 11(42)
Dependency Syntax
Some Theoretical Issues
I Dependency structure sufficient as well as necessary?
I Mono-stratal or multi-stratal syntactic representations?I What is the nature of lexical elements (nodes)?
I Morphemes?I Word forms?I Multi-word units?
I What is the nature of dependency types (arc labels)?I Grammatical functions?I Semantic roles?
I What are the criteria for identifying heads and dependents?
I What are the formal properties of dependency structures?
More at: http://www.ryanmcd.com/courses/esslli2007/
Recent Advances in Dependency Parsing 11(42)
Dependency Syntax
Dependency Graphs
I A dependency structure can be defined as a directed graph G ,consisting of
I a set V of nodes (vertices),I a set A of arcs (directed edges),I a linear precedence order < on V (word order).
I Labeled graphs:I Nodes in V are labeled with word forms (and annotation).I Arcs in A are labeled with dependency types:
I L = {l1, . . . , l|L|} is the set of permissible arc labels.I Every arc in A is a triple (i , j , k), representing a dependency
from wi to wj with label lk .
Recent Advances in Dependency Parsing 12(42)
Dependency Syntax
Dependency Graph Notation
I For a dependency graph G = (V ,A)I With label set L = {l1, . . . , l|L|}
I i → j ≡ ∃k : (i , j , k) ∈ AI i ↔ j ≡ i → j ∨ j → iI i →∗ j ≡ i = j ∨ ∃i ′ : i → i ′, i ′ →∗ jI i ↔∗ j ≡ i = j ∨ ∃i ′ : i ↔ i ′, i ′ ↔∗ j
Recent Advances in Dependency Parsing 13(42)
Dependency Syntax
Formal Conditions on Dependency Graphs
I G is (weakly) connected:I If i , j ∈ V , i ↔∗ j .
I G is acyclic:I If i → j , then not j →∗ i .
I G obeys the single-head constraint:I If i → j , then not i ′ → j , for any i ′ 6= i .
I G is projective:I If i → j , then i →∗ i ′, for any i ′ such that i< i ′< j or j< i ′< i .
Recent Advances in Dependency Parsing 14(42)
Dependency Syntax
Connectedness, Acyclicity and Single-Head
I Intuitions:I Syntactic structure is complete (Connectedness).I Syntactic structure is hierarchical (Acyclicity).I Every word has at most one syntactic head (Single-Head).
I Connectedness can be enforced by adding a special root node.
ROOT Economic news had little effect on financial markets .ROOT adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
Recent Advances in Dependency Parsing 15(42)
Dependency Syntax
Connectedness, Acyclicity and Single-Head
I Intuitions:I Syntactic structure is complete (Connectedness).I Syntactic structure is hierarchical (Acyclicity).I Every word has at most one syntactic head (Single-Head).
I Connectedness can be enforced by adding a special root node.
ROOT Economic news had little effect on financial markets .ROOT adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
p
root
Recent Advances in Dependency Parsing 15(42)
Dependency Syntax
Projectivity
I Most theoretical frameworks do not assume projectivity.I Non-projective structures are needed to account for
I long-distance dependencies,I free word order.
ROOT What did economic news have little effect on ?ROOT pron verb adj noun verb adj noun prep .
pobj
aux
nsubjamod
dobj
prep
amod
p
root
Recent Advances in Dependency Parsing 16(42)
Dependency Parsing
Dependency Parsing
I The problem:I Input: Sentence x = w0,w1, . . . ,wn with w0 = ROOTI Output: Dependency graph G = (V ,A) for x where:
I V = {0, 1, . . . , n} is the vertex set,I A is the arc set, i.e., (i , j , k) ∈ A represents a dependency
from wi to wj with label lk ∈ L
I Two main approaches:I Grammar-based parsing
I Context-free dependency grammarI Lexicalized context-free grammarsI Constraint dependency grammar
I Data-driven parsingI Graph-based modelsI Transition-based modelsI Easy-first parsingI Hybrids: grammar+data-driven, ensembles, etc.
Recent Advances in Dependency Parsing 17(42)
Dependency Parsing
Data-Driven Dependency Parsing
I Need to define a function f : X → GI From sentences x ∈ X to valid dependency graphs G ∈ G
I Most common approach is to learn from training data T ,I where T = {(x1,G1), (x2,G2), . . . , (xn,Gn)},I and (xi ,Gi ) are labeled sentence and dependency graph pairs
that make up the treebank.
I Supervised learning: Fully annotated training examples
I Semi-supervised learning: Annotated data plus constraints andfeatures drawn from unlabeled resources
I Weakly-supervised learning: Constraints drawn fromontologies, structural and lexical resources
I Unsupervised learning: Learning only from unlabeled data
Recent Advances in Dependency Parsing 18(42)
Dependency Parsing
Evaluation Metrics
I Standard setup:I Test set E = {(x1,G1), (x2,G2), . . . , (xn,Gn)}I Parser predictions P = {(x1,G
′1), (x2,G
′2), . . . , (xn,G
′n)}
I Evaluation on the word (arc) level:I Labeled attachment score (LAS) = head and labelI Unlabeled attachment score (UAS) = headI Label accuracy (LA) = label
I Evaluation on the sentence (graph) level:I Exact match (labeled or unlabeled) = complete graph
I NB: Evaluation metrics may or may not include punctuation
Recent Advances in Dependency Parsing 19(42)
Graph-Based Parsing
Graph-Based Parsing (Pre-2008)
I Basic idea:I Define a space of candidate dependency graphs for a sentence.I Learning: Induce a model for scoring an entire dependency
graph for a sentence.I Parsing: Find the highest-scoring dependency graph, given the
induced model.
I Characteristics:I Global training of a model for optimal dependency graphsI Exhaustive search/inference
Recent Advances in Dependency Parsing 20(42)
Graph-Based Parsing
Graph-Based Parsing
I For input sentence x define a graph Gx = (Vx ,Ax ), whereI Vx = {0, 1, . . . , n}I Ax = {(i , j , k) | i , j ∈ V and lk ∈ L}
I Key observation:I Valid dependency trees for x = directed spanning trees of Gx
I Score of dependency tree T factors by subgraphs G1, . . . ,Gm:I s(T ) =
∑mi=1 s(Gi )
I Learning:I Scoring function s(Gi ) for subgraphs Gi ∈ G
I Inference:I Search for maximum spanning tree T ∗ of Gx given s(Gi )
Recent Advances in Dependency Parsing 21(42)
Graph-Based Parsing
Parameterizing Graph-Based Parsing
I First-order (arc-factored) model:
I s(T = (V ,A)) =∑
(i,j,k)∈A s(i , j , k)
I Exact inference in O(n2) time for non-projective trees usingthe Chu-Liu-Edmonds algorithm [McDonald et al. 2005b]
John
saw
Mary
ROOT 910
20
9
30
0
11
3
30
John
saw
Mary
ROOT
10
3030
Recent Advances in Dependency Parsing 22(42)
Graph-Based Parsing
Parameterizing Graph-Based Parsing
I Higher-order models [McDonald and Pereira 2006, Carreras 2007]:I Subgraphs Gi involving a (small) number of arcsI Intractable in non-projective case [McDonald and Satta 2007]I Exact inference in O(n3) time for projective trees with
second-order model using Eisner’s algorithm [Eisner 1996]
I Efficient parsing requires that scores factor by small subgraphs
Recent Advances in Dependency Parsing 23(42)
Graph-Based Parsing
Learning Graph-Based Models
I Typical scoring function:I s(Gi ) = w · f(Gi )
whereI f(Gi ) = high-dimensional feature vector over subgraphsI w = weight vector [wj = weight of feature fj (Gi )]
I Structured learning [McDonald et al. 2005a]:I Learn weights that maximize the score of the correct
dependency tree for every sentence in the training set
I Learning is global (trees), but features are local (subgraphs)
Recent Advances in Dependency Parsing 24(42)
Transition-Based Parsing
Transition-Based Parsing (Pre-2008)
I Basic idea:I Define a transition system (state machine) for mapping a
sentence to its dependency graph.I Learning: Induce a model for predicting the next state
transition, given the transition history.I Parsing: Construct the optimal transition sequence, given the
induced model.
I Characteristics:I Local training of a model for optimal transitionsI Greedy search/inference
Recent Advances in Dependency Parsing 25(42)
Transition-Based Parsing
Transition-Based Parsing (Pre-2008)
I A transition system for dependency parsing definesI a set C of parser configurationsI a set T of transitions, each a function t :C→CI initial configuration and terminal configurations for sentence x
I Key idea:I Valid dependency trees for S defined by terminating transition
sequences C0,m = t1(c0), . . . , tm(cm−1)
I Score of C0,m factors by config-transition pairs (ci−1, ti ):I s(C0,m) =
∑mi=1 s(ci−1, ti )
I Learning:I Scoring function s(ci−1, ti ) for ti (ci−1) ∈ C0,m
I Inference:I Search for highest scoring sequence C∗0,m given s(ci−1, ti )
Recent Advances in Dependency Parsing 26(42)
Transition-Based Parsing
Example: Arc-Eager Projective Parsing
Configuration: (S ,B,A) [S = Stack, B = Buffer, A = Arcs]
Initial: ([ ], [0, 1, . . . , n], { })
Terminal: (S , [ ],A)
Shift: (S , i |B,A) ⇒ (S |i ,B,A)
Reduce: (S |i ,B,A) ⇒ (S ,B,A) h(i ,A)
Right-Arc(k): (S |i , j |B,A) ⇒ (S |i |j ,B,A ∪ {(i , j , k)})
Left-Arc(k): (S |i , j |B,A) ⇒ (S , j |B,A ∪ {(j , i , k)}) ¬h(i ,A) ∧ i 6= 0
Recent Advances in Dependency Parsing 27(42)
Transition-Based Parsing
Inference for Transition-Based Parsing
I Exact inference intractable for standard transition systemsI Standard approach:
I Greedy (pseudo-deterministic) inference[Yamada and Matsumoto 2003, Nivre et al. 2004]
I Complexity given by upper bound on transition sequence length
I Transition systems:I Projective O(n) [Yamada and Matsumoto 2003, Nivre 2003]I Limited non-projective O(n) [Attardi 2006, Nivre 2007]I Unrestricted non-projective O(n2) [Covington 2001, Nivre 2008]
I Efficient parsing requires approximate inference
Recent Advances in Dependency Parsing 28(42)
Transition-Based Parsing
Learning for Transition-Based Parsing
I Typical scoring function:I s(c, t) = w · f(c, t)
whereI f(c, t) = feature vector over configuration c and transition tI w = weight vector [wj = weight of feature fj (c, t)]
I Simple classification problem:I Learn weights that maximize the score of the correct transition
out of every configuration in the training setI Configurations represent derivation history, including partially
built dependency tree
I Learning is local, but features are global
Recent Advances in Dependency Parsing 29(42)
McDonald & Nivre (2007)
CoNLL 2006
I CoNLL 2006: Shared Task on Dependency ParsingI Evaluation of 13 different languages
I Top 2 systems statistically identical: One graph-based(MSTParser) and the other transition-based (MaltParser)
I Question: do the systems learn the same things?
Recent Advances in Dependency Parsing 30(42)
McDonald & Nivre (2007)
MSTParser and MaltParser
MSTParser MaltParser
Arabic 66.91 66.71Bulgarian 87.57 87.41
Chinese 85.90 86.92Czech 80.18 78.42
Danish 84.79 84.77Dutch 79.19 78.59
German 87.34 85.82Japanese 90.71 91.65
Portuguese 86.82 87.60Slovene 73.44 70.30Spanish 82.25 81.29Swedish 82.55 84.58Turkish 63.19 65.68
Overall 80.83 80.75
Recent Advances in Dependency Parsing 31(42)
McDonald & Nivre (2007)
Comparing the Models
I Inference:I Exhaustive (MSTParser)I Greedy (MaltParser)
I Training:I Global structure learning (MSTParser)I Local decision learning (MaltParser)
I Features:I Local features (MSTParser)I Rich decision history (MaltParser)
I Fundamental trade-off:I Global learning and inference vs. rich feature space
Recent Advances in Dependency Parsing 32(42)
McDonald & Nivre (2007)
Error Analysis [McDonald and Nivre 2007]
I Aim:I Relate parsing errors to linguistic and structural properties of
the input and predicted/gold standard dependency graphs
I Three types of factors:I Length factors: sentence length, dependency lengthI Graph factors: tree depth, branching factor, non-projectivityI Linguistic factors: part of speech, dependency type
I Statistics:I Labeled accuracy, precision and recallI Computed over the test sets for all 13 languages
Recent Advances in Dependency Parsing 33(42)
McDonald & Nivre (2007)
Sentence Length
10 20 30 40 50 50+Sentence Length (bins of size 10)
0.7
0.72
0.74
0.76
0.78
0.8
0.82
0.84
Depe
nden
cy A
ccur
acy MSTParser
MaltParser
I MaltParser is more accurate than MSTParser for shortsentences (1–10 words) but its performance degrades morewith increasing sentence length.
Recent Advances in Dependency Parsing 34(42)
McDonald & Nivre (2007)
Dependency Length
0 5 10 15 20 25 30Dependency Length
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Depe
nden
cy P
recis
ion MSTParser
MaltParser
0 5 10 15 20 25 30Dependency Length
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Depe
nden
cy R
ecal
l MSTParserMaltParser
I MaltParser is more precise than MSTParser for shortdependencies (1–3 words) but its performance degradesdrastically with increasing dependency length (> 10 words).
I MSTParser has more or less constant precision fordependencies longer than 3 words.
I Recall is very similar across systems.
Recent Advances in Dependency Parsing 35(42)
McDonald & Nivre (2007)
Tree Depth (Distance to Root)
2 4 6 8 10Distance to Root
0.74
0.76
0.78
0.8
0.82
0.84
0.86
0.88
0.9
Depe
nden
cy P
recis
ion MSTParser
MaltParser
2 4 6 8 10Distance to Root
0.76
0.78
0.8
0.82
0.84
0.86
0.88
Depe
nden
cy R
ecal
l MSTParserMaltParser
I MSTParser is much more precise than MaltParser fordependents of the root and has roughly constant precision fordepth > 1, while MaltParser’s precision improves withincreasing depth (up to 7 arcs).
I Recall is very similar across systems.
Recent Advances in Dependency Parsing 36(42)
McDonald & Nivre (2007)
Degrees of Non-Projectivity
0 1 2+Non-Projective Arc Degree
0
0.2
0.4
0.6
0.8
Depe
nden
cy P
recis
ion MSTParser
MaltParser
0 1 2+Non-Projective Arc Degree
0
0.2
0.4
0.6
0.8
1
Depe
nden
cy R
ecal
l MSTParserMaltParser
I Degree of a dependency arc (i , j , k) = The number of wordsin the span min(i , j), . . . ,max(i , j) that are not descendants ofi and have their head outside the span.
I MaltParser has slightly higher precision, and MSTParserslightly higher recall, for non-projective arcs (degree > 0).
I No system predicts arcs with a higher degree than 2.
Recent Advances in Dependency Parsing 37(42)
McDonald & Nivre (2007)
Part of Speech
60.0%
65.0%
70.0%
75.0%
80.0%
85.0%
90.0%
95.0%
Verb Noun Pron Adj Adv Adpos Conj
Part of Speech (POS)
Labe
led
Atta
chm
ent S
core
(LAS
)
MSTParserMaltParser
I MSTParser is more accurate for verbs, adjectives, adverbs,adpositions, and conjunctions.
I MaltParser is more accurate for nouns and pronouns.
Recent Advances in Dependency Parsing 38(42)
McDonald & Nivre (2007)
Dependency Type: Root, Subject, Object
65.0%
70.0%
75.0%
80.0%
85.0%
90.0%
95.0%
Root Subj Obj
Dependency Type (DEP)
Depe
nden
cy P
recis
ion
MSTParserMaltParser
72.0%
74.0%
76.0%
78.0%
80.0%
82.0%
84.0%
86.0%
88.0%
90.0%
Root Subj Obj
Dependency Type (DEP)
Depe
nden
cy R
ecal
l
MSTParserMaltParser
I MSTParser has higher precision (and recall) for roots.
I MSTParser has higher recall (and precision) for subjects.
Recent Advances in Dependency Parsing 39(42)
McDonald & Nivre (2007)
Discussion
I Many of the results are indicative of the fundamentaltrade-off: global learning/inference versus rich features.
I Global inference improves decisions for long sentences andthose near the top of graphs.
I Rich features improve decisions for short sentences and thosenear the leaves of the graphs.
I Dependency parsing post-2008:I How do we use this to improve parser performance?
Recent Advances in Dependency Parsing 40(42)
McDonald & Nivre (2007)
Voting and Stacking
I Early improvements were based on system combinationI Voting:
I Let parsers vote for heads [Zeman and Zabokrtsky 2005]I Use MST algorithm for tree constraint [Sagae and Lavie 2006]
I Stacking:I Use the output of one parser as features for the other
[Nivre and McDonald 2008, Torres Martins et al. 2008]
I Focus today:I Recent work evolving the approaches themselvesI Richer feature representations in graph-based parsingI Improved learning and inference in transition-based parsing
Recent Advances in Dependency Parsing 41(42)
Conclusion
Summary
I Dependency syntax – basic concepts
I Dependency parsing – graph-based and transition-based
I Empirical trade-offs present way forward
Recent Advances in Dependency Parsing 42(42)
References and Further Reading
References and Further Reading
I Giuseppe Attardi. 2006. Experiments with a multilanguage non-projectivedependency parser. In Proceedings of the 10th Conference on ComputationalNatural Language Learning (CoNLL), pages 166–170.
I Xavier Carreras. 2007. Experiments with a higher-order projective dependencyparser. In Proceedings of the Joint Conference on Empirical Methods in NaturalLanguage Processing and Computational Natural Language Learning(EMNLP-CoNLL), pages 957–961.
I Michael A. Covington. 2001. A fundamental algorithm for dependency parsing. InProceedings of the 39th Annual ACM Southeast Conference, pages 95–102.
I Ralph Debusmann, Denys Duchier, and Geert-Jan M. Kruijff. 2004. Extensibledependency grammar: A new methodology. In Proceedings of the Workshop onRecent Advances in Dependency Grammar, pages 78–85.
I Denys Duchier and Ralph Debusmann. 2001. Topological dependency trees: Aconstraint-based account of linear precedence. In Proceedings of the 39th AnnualMeeting of the Association for Computational Linguistics (ACL), pages 180–187.
I Jason M. Eisner. 1996. Three new probabilistic models for dependency parsing: Anexploration. In Proceedings of the 16th International Conference on ComputationalLinguistics (COLING), pages 340–345.
Recent Advances in Dependency Parsing 42(42)
References and Further Reading
I Peter Hellwig. 1986. Dependency unification grammar. In Proceedings of the 11thInternational Conference on Computational Linguistics (COLING), pages 195–198.
I Peter Hellwig. 2003. Dependency unification grammar. In Vilmos Agel, Ludwig M.Eichinger, Hans-Werner Eroms, Peter Hellwig, Hans Jurgen Heringer, and HeningLobin, editors, Dependency and Valency, pages 593–635. Walter de Gruyter.
I Richard A. Hudson. 1984. Word Grammar. Blackwell.
I Richard A. Hudson. 1990. English Word Grammar. Blackwell.
I Richard Hudson. 2007. Language Networks. The New Word Grammar. OxfordUniversity Press.
I Timo Jarvinen and Pasi Tapanainen. 1998. Towards an implementable dependencygrammar. In Sylvain Kahane and Alain Polguere, editors, Proceedings of theWorkshop on Processing of Dependency-Based Grammars, pages 1–10.
I Sandra Kubler, Joakim Nivre, and Ryan McDonald. 2009. Dependency Parsing.Morgan & Claypool Publishers.
I Hiroshi Maruyama. 1990. Structural disambiguation with constraint propagation. InProceedings of the 28th Meeting of the Association for Computational Linguistics(ACL), pages 31–38.
I Ryan McDonald and Joakim Nivre. 2007. Characterizing the errors of data-drivendependency parsing models. In Proceedings of the Join Conference on Empirical
Recent Advances in Dependency Parsing 42(42)
References and Further Reading
Methods in Natural Language Processing and the Conference on ComputationalNatural Language Learning (EMNLP-CoNLL).
I Ryan McDonald and Fernando Pereira. 2006. Online learning of approximatedependency parsing algorithms. In Proceedings of the 11th Conference of theEuropean Chapter of the Association for Computational Linguistics (EACL), pages81–88.
I Ryan McDonald and Giorgio Satta. 2007. On the complexity of non-projectivedata-driven dependency parsing. In Proceedings of the 10th InternationalConference on Parsing Technologies (IWPT), pages 122–131.
I Ryan McDonald, Koby Crammer, and Fernando Pereira. 2005a. Onlinelarge-margin training of dependency parsers. In Proceedings of the 43rd AnnualMeeting of the Association for Computational Linguistics (ACL), pages 91–98.
I Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajic. 2005b.Non-projective dependency parsing using spanning tree algorithms. In Proceedingsof the Human Language Technology Conference and the Conference on EmpiricalMethods in Natural Language Processing (HLT/EMNLP), pages 523–530.
I Igor Mel’cuk. 1988. Dependency Syntax: Theory and Practice. State University ofNew York Press.
I Wolfgang Menzel and Ingo Schroder. 1998. Decision procedures for dependencyparsing using graded constraints. In Sylvain Kahane and Alain Polguere, editors,
Recent Advances in Dependency Parsing 42(42)
References and Further Reading
Proceedings of the Workshop on Processing of Dependency-Based Grammars,pages 78–87.
I Jasmina Milicevic. 2006. A short guide to the Meaning-Text Theory. Journal ofKoralex, 8:187–233.
I Joakim Nivre and Ryan McDonald. 2008. Integrating graph-based andtransition-based dependency parsers. In Proceedings of the 46th Annual Meeting ofthe Association for Computational Linguistics (ACL), pages 950–958.
I Joakim Nivre, Johan Hall, and Jens Nilsson. 2004. Memory-based dependencyparsing. In Hwee Tou Ng and Ellen Riloff, editors, Proceedings of the 8thConference on Computational Natural Language Learning (CoNLL), pages 49–56.
I Joakim Nivre. 2003. An efficient algorithm for projective dependency parsing. InGertjan Van Noord, editor, Proceedings of the 8th International Workshop onParsing Technologies (IWPT), pages 149–160.
I Joakim Nivre. 2007. Incremental non-projective dependency parsing. InProceedings of Human Language Technologies: The Annual Conference of theNorth American Chapter of the Association for Computational Linguistics(NAACL-HLT), pages 396–403.
I Joakim Nivre. 2008. Algorithms for deterministic incremental dependency parsing.Computational Linguistics, 34:513–553.
Recent Advances in Dependency Parsing 42(42)
References and Further Reading
I Kenji Sagae and Alon Lavie. 2006. Parser combination by reparsing. In Proceedingsof the Human Language Technology Conference of the NAACL, CompanionVolume: Short Papers, pages 129–132.
I Ingo Schroder. 2002. Natural Language Parsing with Graded Constraints. Ph.D.thesis, Hamburg University.
I Petr Sgall, Eva Hajicova, and Jarmila Panevova. 1986. The Meaning of theSentence in Its Pragmatic Aspects. Reidel.
I Pasi Tapanainen and Timo Jarvinen. 1997. A non-projective dependency parser. InProceedings of the 5th Conference on Applied Natural Language Processing, pages64–71.
I Lucien Tesniere. 1959. Elements de syntaxe structurale. Editions Klincksieck.
I Andre Filipe Torres Martins, Dipanjan Das, Noah A. Smith, and Eric P. Xing.2008. Stacking dependency parsers. In Proceedings of the Conference on EmpiricalMethods in Natural Language Processing, pages 157–166.
I Hiroyasu Yamada and Yuji Matsumoto. 2003. Statistical dependency analysis withsupport vector machines. In Gertjan Van Noord, editor, Proceedings of the 8thInternational Workshop on Parsing Technologies (IWPT), pages 195–206.
Recent Advances in Dependency Parsing 42(42)