ACTA UNIVERSITATIS UPSALIENSIS Studia...

ACTA UNIVERSITATIS UPSALIENSIS Studia Linguistica Upsaliensia

Linguistically Informed Neural Dependency Parsing for

Typologically Diverse Languages

Miryam de Lhoneux

si bi itht w

lc1(s1) rc1(s1)s1 ◦

bi bw f w +lc> 0.5 +lc

bi bw f w+lc

∗= p < .05 ∗∗= p < .01

bas rc

p < .05p < .01

p < .05 p < .01

p < .05 ∗∗= p < .01

| < μ−σ < | < μ− < μ < μ+ < | < μ+σ <

∗= p < .05 ∗∗= p < .01

p < .05 p < .01

htx1:t

[[Φ]]Φ s0 s1

b ΣΣ−s0 s0 Σ

B B−b b

bi+rc +lc

bi bwf w

+rc +lc

G (V,A)V A (h,r,d)

h r dn w1:n

V {I, love,syntax} A{(love,nsub j, I),(love,ob j,syntax)}

x1:n x1, ...,xn

(wi,r,w j) ∈ A G = (V,A)wi →∗ wk i < k < j i < j j < k < i j < i

→∗A

{(love,nsub j, I),(love,nmod,syntax)}

c0(x = (w1, . . . ,wn)) = ([ ], [1, . . . ,n,0], /0)

Ct = {c ∈C |c = ([ ], [0],A)}

d (σ |i, j|β ,A)⇒ (σ , j|β ,A∪{( j,r, i)})d (σ |i| j,β ,A)⇒ (σ |i,β ,A∪{(i,r, j)})

(σ , i|β ,A)⇒ (σ |i,β ,A) i �= 0

Σσ B β

LEFT−ARC

RIGHT−ARCthe brown fox jumped root

Transitions: Configuration:

STACK BUFFER

si bi itht w lc1(s1) rc1(s1)

s1 ◦

s1.w;s1.t;s1.wt;s2.w;s2.t;s2.wt;b1.w;b1.t;b1.wt;

s1.wt ◦ s2.wt;s1.wt ◦ s2.w;s1.wt ◦ s2.ts1.w◦ s2.wt;s1.t ◦ s2.wt;s1.w◦ s2.ws1.t ◦ s2.t;s1.t ◦b1.t

s2.t ◦ s1.t ◦b1.t;s2.t ◦ s1.t ◦ lc1(s1).t;s2.t ◦ s1.t ◦ rc1(s1).t;s2.t ◦ s1.t ◦ lc1(s2).t;s2.t ◦ s1.t ◦ rc1(s2).t;s2.t ◦ s1.w◦ rc1(s2).t;s2.t ◦ s1.w◦ lc1(s1).t;s2.t ◦ s1.w◦b1.t

j r ( j,r, i)d j

i (i,r, j)i

c φ(·)

i ←s T

c ← cs(s)c

tp ← argmaxt ·φ(c, t)to ← o(c,T )

tp �= toφ(c, to),φ(c, tp)

c ← to(c)

cs(s) stp c

φ(c, t)to tp

,φ(c, to),φ(c, tp)o(c,T )

Wx+bx f

h = f (W1x+b)y = so f tmax(W2h)

��

xt xlxw

xw xt xl h

h = (W w1 xw +Wt

1xt +W l1xl +b1)

p = so f tmax(W2h)

0x x x x

hh h h0 1hh1

ht x1:t

x1:n ht

ht xtht−1 ht

h = RNN(x1:n)

ht = f (ht−1,xt)

ft = σ(Wf xt +Uf ht−1 +b f )

it = σ(Wixt +Uiht−1 +bi)

ot = σ(Woxt +Uoht−1 +bo)

ct = ft ct−1 + it tanh(Wcxt +Ucht−1 +bc)

ht = ot tanh(ct)

tσ ft it ot

ft ct−1(it tanh(Wcxt +Ucht−1 +bc)ct

b xn:1i xi

[v1;v2]

(x1:n,xi) = [ f (x1:i); b(xn:i)]

cφ(·)

φ(·)

n w1:n x1:nxi wi

ti wi xi

xi = [e(wi);e(ti)]

s2 s1 s0 b0

vi = (x1:n, i)φ(c) = [vs2 ;vs1 ;vs0 ;vb0 ]

(φ(c)) =W 2tanh(W 1φ(c)+b1)+b2

LSTM f

LSTM b

LSTM f

LSTM b

LSTM f

LSTM b

concat concat concat concat

LSTM f

LSTM b

concat

LSTM f

LSTM b

Vthe Vfox Vjumped Vroot

Vbrown

X the X brown fox jumped root

score(LEFT−ARC),score(RIGHT−ARC),score(SHIFT)

multiple models

multiple languages

one model

multiple languageslanguages have equal status

source and target languages

Multilingual parsing

multi−source single−source

polymonolingual

polyglot

cross−lingual

Task A Task B Task C Task specific

layers

Shared layers

concat

Cf Cf Cf

Cb Cb Cb

ce(wi)wi ch j 1 ≤ j ≤ m wi

ce(wi) = (ch1:m)

i xie(wi), ce(wi)

e(ti) xi

xi = [e(wi);ce(wi);e(ti)]

pe(wi)

xi = [e(wi);ce(wi)]

c0(x = (w1, . . . ,wn)) = ([ ], [1, . . . ,n,0], /0)

Ct = {c ∈C |c = ([ ], [0],A)}

d (σ |i, j|β ,A)⇒ (σ , j|β ,A∪{( j,r, i)}) j �= 0 ∨ σ = [ ]

d (σ |i| j,β ,A)⇒ (σ |i,β ,A∪{(i,r, j)})(σ , i|β ,A)⇒ (σ |i,β ,A) i �= 0

(σ |s0, b|β , A)⇒ (σ , b|s0|β , A) β �= 0∧ s0 < b

Σσ B β

everbest ever

examplebest a ever

LEFT−ARC

RIGHT−ARCthe brown fox jumped root

Transitions: Configuration:

STACK BUFFER

s0s0 b

[ ]Σ [ 1 2 4 3 ]B[ 1 ]Σ [ 2 4 3 ]B[ 1 2 ]Σ [ 4 3 ]B[ 1 2 4 ]Σ [ 3 ]B[ 1 2 ]Σ [ 3 4 ]B[ 1 2 3 ]Σ [ 4 ]B[ 1 2 ]Σ [ 4 ]B

[ 1 ]Σ [ 4 ]B

[ 1 4 ]Σ [ ]B[ 1 ]Σ [ ]B

O(n)O(n2) n

o(c,T )

i ←s T

c ← cs(s)c

tp ← argmaxt ·φ(c, t)to ← o(c,T )

tp �= toφ(c, to),φ(c, tp)

c ← to(c)

o(t;c,T ) ct

[ ]Σ [ ]B[ ]Σ [ ]B[ ]Σ [ ]B[ ]Σ [ ]B

[ ]Σ [ ]B[ ]Σ [ ]B

[ ]Σ [ ]B[ ]Σ [ ]B[ ]Σ [ ]B[ ]Σ [ ]B[ ]Σ [ ]B

[ ]Σ [ ]B

[ ]Σ [ ]B[ ]Σ [ ]B

i ←s T

c ← cs(s)c

←{t | o(t;c,T ) = }tp ← argmaxt∈ (c) ·φ(c, t)to ← argmaxt∈ (c) ·φ(c, t)

tp /∈φ(c, to),φ(c, tp)

c ← k,p c, to, tp, i

c ← tp(c)

k,p c, to, tp, ii > k ()< p

[ ]Σ [ ]B ⇒ [ ]Σ [ ]B

s0s0 s1

C( ;c,T ) (b,s0) s0s0 H = {s1}∪β

D = {b}∪βT (s0,d) (h,s0) h ∈ H d ∈ D

C( ;c,T ) (s1,s0) s0s0 B =

{b}∪β T (s0,d)(h,s0) h,d ∈ B

C( ;c,T ) b bH = {s1}∪σ

D = {s0,s1}∪σ T(b,d) (h,b) h ∈ H d ∈ D

s1 s0 b

[ ]Σ [ ]B

(i) i(i) i i

(s0)> (b)

(b,s0) s0 s0b

s1|βb|β s0

(s0)(h(s0))

(s1,s0) s0s0 s1

b|β s0

C ( ) = | (s0)| + [[h(s0) �= b s0 ∈ (h(s0))]]

(s0) = [ ] s0 (h(s0))

C ( ) = | (s0)| + [[h(s0) �= s1 s0 ∈ (h(s0))]]

(s0) = [ ] s0 (h(s0))

i ∈ B−b b < i (b)> (i)

C ( ) = 0

C ( ) = |{d ∈ (b) |d ∈ Σ}| + [[h(b) ∈ Σ−s0 b ∈ (h(b))]]

b (h(b)) h(b) ∈ Σ−s0 d ∈ Σ(b)

[[Φ]]Φ s0 s1

b ΣΣ−s0 s0 Σ B

B−b b

(s0)(h(s0))

b bΣ s0

p < 0.01p < 0.05

xie(wi) pe(wi)

ce(wi)

xi = [e(wi); pe(wi);ce(wi)]

δ δ δ

=++−−−+−=+−++−=++=−+==+=

� �

The largest city in Minnesota

Recursive

The largest city in Minnesota

Recurrent

xi wie(wi) pe(wi)

xi = max{0,W ([e(wi); pe(wi);e(ti)])+b}xi

an decision was made ROOT

REDUCE−LEFT(amod)

Stack Buffer

softmax

overhasty

Actions

tanh hc

c = tanh(W [h;d;r]+b)

CfoxCfox = tanh(W[Cfox,Cthe,left−det]+b)

Cthe Cbrown Cfox Cjumped Croot

LSTM f

LSTM b

LSTM f

LSTM b

LSTM f

LSTM b

concat concat concat concat

LSTM f

LSTM b

concat

LSTM f

LSTM b

Vbrown

vi = [ (x1:n, i);cti]

c0i = (x1:n, i)

d h ir ct

h d rh h ct

+rc +lc

cti = tanh(W [ct−1

i ;d;r]+b)

cti = i([ct−1

i ;d;r])

e(wi)wi e(ti) ce(wi)

xi = [e(wi);e(ti);ce(wi)])

f (x, i)xi x

vi wi bw

vi = f (x, i)bw(x, i) = (xn:1, i)f w(x, i) = (x1:n, i)bi(x, i) = [bw(x, i); f w(x, i)]

bi+rc +lc

+lc+rc

bi bwf w

R =−0.838 p < .01

p > .05

bi bwf w +rc +lc

− −

bi bi+ lc+ + − −

− +bw f w

− −

bw f w

bi bw f w

bw f w bibw f w

bi bw f w +lc> 0.5 +lc

+ + + −

− + − −

bw f w

f w bw

bi bw f w+lc

+ ++ −− +− −

bi bw f w +lc

V vi mviAUXi

wauxaux←−−wmv

viV wmv mvi waux

vi AUXi v vi+1V

vi V AUXimvi

mvimvi

wdaux←−−wh wd wh

wh auxi V,

Vi wh auxi

∗= p < .05 ∗∗= p < .01δ δ

I did thisVerbForm=Fin

nsubj obj

Figure 5.5. Finite main verb in a UD tree.

I could easily have done thisNFMV

nsubjaux

advmod

I could easily have done thisMAUX

nsubj advmod

Figure 5.6. Example sentence with an AVC annotated in UD (top), and in MS (bottom).AVC subtree in thick blue.

therefore represent the word in the context of the sentence) to the correspond-ing vectors for verb types (the vectors at the input of the BiLSTM) in orderto better understand what part of the representation is context-dependent. Wefinally compare the verb type vectors learned by the parser to verb type vectorslearned with a language modeling objective. We expect the vectors learned bythe parser to encode information about agreement and transitivity to a greaterextent than the vectors learned using a language modelling objective. I explainthis in more detail in Section 5.3.3.

Collecting FMVs such as in Figure 5.5 in UD treebanks is straightforward:verbs are annotated with a feature called VerbForm which has the value Finif the verb is finite. We find candidates using the feature VerbForm=Fin andonly keep those that are not dependent of a copula or an auxiliary dependencyrelation to make sure they are not part of a larger verbal construction.

We can collect AVCs in the UD and in the MS training data using the firstpart of the transformation and backtransformation algorithms respectively whichI defined in Section 5.2.1. We scan the sentence left-to-right, looking for aux-iliary dependency relations and collecting information about what is the outer-most auxiliary and what is the main verb. When we have our set of FMVs andAVCs, we can create our task data sets.

xie(wi)

ce(wi)

vi = BiLST M(x1:n, i)xi = [e(wi);ce(wi)]

It seems interesting to find out whether or not a composition function can beuseful to model the specific case of transfer relations such as is found in AVCs.For this reason, we also investigate the composed representation of AVCs.

We are therefore interested in finding out whether or not an LSTM trainedwith a parsing objective can learn the notion of dissociated nucleus as well aswhether or not a recursive composition function can help to learn this notion.As just mentioned, the head of an AVC in UD is a non-finite main verb, whichwe will refer to as NFMV, as depicted in Figure 5.6. The head of an AVC in MSis the outermost auxiliary, which we will refer to as the main auxiliary MAUX,as also depicted in that figure. We therefore look at NFMV and MAUX tokenvectors for the respective representation schemes and consider two definitionsof these: One where we use the BiLSTM encoding of the main verb token vi,and the other where we construct a subtree vector ct

i by recursively composingthe representation of AVCs as auxiliaries get attached to their main verb, asin Equation 4.5, repeated in Equation 5.3. When training the parser, we con-catenate this composed vector to a vector of the head of the subtree to form vi.In Chapter 4, we used two different composition functions, one using a sim-ple recurrent cell and one using an LSTM cell. We saw that the one using anLSTM cell performed better. However, in this set of experiments, we only dorecursive compositions over a limited part of the subtree: only between auxil-iaries and NFMVs. This means that the LSTM would only pass through twostates in most cases, and never more than 4.8 This does not allow us to learnproper weights for the input, output and forget gates. An RNN cell seems moreappropriate here and we only use that.

cti = tanh(W [ct−1

i ;d;r]+b) (5.3)

LSTM f

LSTM b

concat concat concat

LSTM f

LSTM b

concat

LSTM f

LSTM b

LSTM f

LSTM b

V I Vdone VthatVhave

X I X have done thatI h a v e d o n e t h a t

e(done) e(that)e(I) e(have)

NFMVMAUX

chartype

Figure 5.7. Example AVC with vectors of interest.

8The maximum number of auxiliaries in one AVC in our dataset is 3.

LSTM f

LSTM b

LSTM f

LSTM b

LSTM f

LSTM b

concat concat concatconcat

LSTM f

LSTM b

Vdid Vthat V !

X did that !d i d t h a t !

e(did) e(that) e(!)

toktok

bas rc

p < .05 p < .01δ δ

δp < .05

p < .01δ δ

p < .01

have donedid !

FMV NFMVMAUXpunct

FMV NFMVMAUX

have done

p < .05∗∗= p < .01

le lel tanh

le = tanh(Wl +b)

te(wi) wi

xi = [pe(wi);ce(wi);e(ti); te(wi)]

| < μ−σ < | < μ− < μ < μ+ < | < μ+σ < |

μ ± , = σ/√

μ ±σ

concat concat concat concatconcat

concat

LSTM f

LSTM b

LSTM f

LSTM b

LSTM f

LSTM b

LSTM f

LSTM b

LSTM f

LSTM b

Vbrown

(score(LEFT−ARC),score(RIGHT−ARC),score(SHIFT), score(SWAP))

CfCf Cf

CbCbCb

Character

te(wi) wiwi

xi = [e(wi);ce(wi); te(wi)]

ch j = [e(ch j); te(wi)]

� �

φ(c) = [vs1 ;vs0 ;vb0 ; te(s1,s0,b0)]

� �

� � �

� �

� � �

� �

� � �

� �

� � �

� �

� � �

� �

��

� �

p < 0.01

� �

� � �

� �

� � �

� �

� � �

� �

� � �

� �

� � �

� �

∗ = p < .05∗∗= p < .01

δp < .05

p < .01δ δ

xi = [pe(wi);ce(wi);e(ti)]xi = [pe(wi);ce(wi);e(ti); tbe(wi)]

ACTA UNIVERSITATIS UPSALIENSIS Studia Linguistica Upsaliensia

Editors: Michael Dunn and Joakim Nivre 1. Jörg Tiedemann, Recycling translations. Extraction of lexical data from parallel

corpora and their application in natural language processing. 2003. 2. Agnes Edling, Abstraction and authority in textbooks. The textual paths towards

specialized language. 2006. 3. Åsa af Geijerstam, Att skriva i naturorienterande ämnen i skolan. 2006. 4. Gustav Öquist, Evaluating Readability on Mobile Devices. 2006. 5. Jenny Wiksten Folkeryd, Writing with an Attitude. Appraisal and student texts

in the school subject of Swedish. 2006. 6. Ingrid Björk, Relativizing linguistic relativity. Investigating underlying assump-

tions about language in the neo-Whorfian literature. 2008. 7. Joakim Nivre, Mats Dahllöf and Beáta Megyesi, Resourceful Language Tech-

nology. Festschrift in Honor of Anna Sågvall Hein. 2008. 8. Anju Saxena & Åke Viberg, Multilingualism. Proceedings of the 23rd Scandinavi-

an Conference of Linguistics. 2009. 9. Markus Saers, Translation as Linear Transduction. Models and Algorithms for

Efficient Learning in Statistical Machine Translation. 2011. 10. Ulrika Serrander, Bilingual lexical processing in single word production. Swedish

learners of Spanish and the effects of L2 immersion. 2011. 11. Mattias Nilsson, Computational Models of Eye Movements in Reading : A Data-

Driven Approach to the Eye-Mind Link. 2012. 12. Luying Wang, Second Language Acquisition of Mandarin Aspect Markers by

Native Swedish Adults. 2012. 13. Farideh Okati, The Vowel Systems of Five Iranian Balochi Dialects. 2012. 14. Oscar Täckström, Predicting Linguistic Structure with Incomplete and Cross-

Lingual Supervision. 2013. 15. Christian Hardmeier, Discourse in Statistical Machine Translation. 2014. 16. Mojgan Seraji, Morphosyntactic Corpora and Tools for Persian. 2015. 17. Eva Pettersson, Spelling Normalisation and Linguistic Analysis of Historical Text

for Information Extraction. 2016. 18. Marie Dubremetz, Detecting Rhetorical Figures Based on Repetition of Words:

Chiasmus, Epanaphora, Epiphora. 2017. 19. Josefin Lindgren, Developing narrative competence: Swedish, Swedish-German

and Swedish-Turkish children aged 4–6. 2018. 20. Vera Wilhelmsen, A Linguistic Description of Mbugwe with Focus on Tone and

Verbal Morphology. 2018. 21. Yan Shao, Segmenting and Tagging Text with Neural Networks. 2018. 22. Ali Basirat, Principal Word Vectors. 2018. 23. Marc Tang, A typology of classifiers and gender. 2018. 24. Miryam de Lhoneux, Linguistically Informed Neural Dependency Parsing for

Typologically Diverse Languages. 2019.

ACTA UNIVERSITATIS UPSALIENSIS Studia...

Documents

ACTA UNIVERSITATIS UPSALIENSIS Uppsala Studies in …uu.diva-portal.org/smash/get/diva2:1089339/FULLTEXT01.pdf · Cutlery-Making Practices ..... 115 4.1 Organising Cutlery Making:

Probing Catalytic Reaction Mechanisms ... - uu.diva-portal.orguu.diva-portal.org/smash/get/diva2:1299011/FULLTEXT01.pdf · ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2019 Digital Comprehensive

ACTA UNIVERSITATIS UPSALIENSIS Uppsala Dissertations from ...uu.diva-portal.org/smash/get/diva2:166818/FULLTEXT01.pdf · ACTA UNIVERSITATIS UPSALIENSIS Uppsala Dissertations from

ACTA UNIVERSITATIS UPSALIENSIS Uppsala Studies in …uu.diva-portal.org/smash/get/diva2:689938/FULLTEXT01.pdfledande kommentarer andades dock ringa hoppfullhet inför detta projekt

ACTA UNIVERSITATIS UPSALIENSIS Studia …uu.diva-portal.org/smash/get/diva2:1087825/FULLTEXT01.pdf · understood in audition spaces. ... Anna’s enriching comments, ... There is

ACTA UNIVERSITATIS UPSALIENSIS Studia Musicologia …uu.diva-portal.org/smash/get/diva2:1367114/FULLTEXT01.pdf · 2019-11-18 · Celebrating Lutheran Music Scholarly Perspectives

Methodological Studies on Models and Methods for …uu.diva-portal.org/smash/get/diva2:172702/FULLTEXT01.pdfACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2008 Digital Comprehensive Summaries

ACTA UNIVERSITATIS UPSALIENSIS Studia …uu.diva-portal.org/smash/get/diva2:697880/FULLTEXT01.pdf · las disputas entre católicos y protestantes, Paracelso y luego los rosacruces

ACTA UNIVERSITATIS UPSALIENSIS Uppsala Studies in ...uu.diva-portal.org/smash/get/diva2:615554/FULLTEXT01.pdfuddannelsen ved Jette Steensen og sygeplejerskeuddannelsen, som jeg studerede

ACTA UNIVERSITATIS UPSALIENSIS Uppsala Studies in Economic

ACTA UNIVERSITATIS UPSALIENSIS Uppsala Dissertations from ... · PDF fileACTA UNIVERSITATIS UPSALIENSIS Uppsala Dissertations from the Faculty of Science and Technology ... and EU

ACTA UNIVERSITATIS UPSALIENSIS Studia Linguistica ...uu.diva-portal.org/smash/get/diva2:1219609/FULLTEXT01.pdf · in the results. A comparison between principal word embedding and

Development of Methods for Phase System Characterization ...uu.diva-portal.org/smash/get/diva2:171707/FULLTEXT01.pdf · ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2008 Digital Comprehensive

ACTA UNIVERSITATIS UPSALIENSIS Studia Linguistica ...uu.diva-portal.org/smash/get/diva2:390708/FULLTEXT02.pdf · Dissertation presented at Uppsala University to be publicly examined

ACTA UNIVERSITATIS UPSALIENSIS Uppsala Dissertations from ...user.it.uu.se/~ea/Abd-Elrady_Emad_Thesis.pdf · English. Acta Universitatis Upsaliensis. Uppsala Dissertations from the

ACTA UNIVERSITATIS UPSALIENSIS Studia Linguistica ...uu.diva-portal.org/smash/get/diva2:1371412/FULLTEXT01.pdf · The examination will be conducted in English. Faculty examiner: Prof

ACTA UNIVERSITATIS UPSALIENSIS Uppsala Studies in …uu.diva-portal.org/smash/get/diva2:1249611/FULLTEXT01.pdf · Dissertation presented at Uppsala University to be publicly examined

ACTA UNIVERSITATIS UPSALIENSIS Uppsala …uu.diva-portal.org/smash/get/diva2:953166/FULLTEXT01.pdf · Empirical ethics ... at Anchor English has proofread the English text. Thank

ACTA UNIVERSITATIS UPSALIENSIS Uppsala Dissertations from ...uu.diva-portal.org/smash/get/diva2:1034449/FULLTEXT01.pdf · ACTA UNIVERSITATIS UPSALIENSIS Uppsala Dissertations from

ACTA UNIVERSITATIS UPSALIENSIS Uppsala Studies in Education