1
= MatrixTree(score ij ) § Tree analysis: Yang Liu and Mirella Lapata. 2018. ”Learning Structured Text Representations”. Transactions of the Association for Computational Linguistics, 6:63–75. design changes and bug fix are made to L&L code and results are also confirmed without these modifications; see paper for details. Code and data at https://github.com/elisaf/structured 0 5 10 15 20 25 30 Full -biLSTM -biLSTM, +1perc -biLSTM, +4perc parsed RST Tree Height 0 1 2 3 4 5 Yelp Debates WQ WQTC WSJSO Tree Height § 14% 4% 3% 3% 0% 0 4 8 12 16 Full -biLSTM -biLSTM, +1perc -biLSTM, +4perc parsed RST % Vacuous Trees 73% 38% 42% 14% 100% 0 20 40 60 80 100 Yelp Debates WQ WQTC WSJSO % Vacuous Trees § Removing biLSTM + adding 4 levels of children percolation yields similar performance as Full model. b. percolate context from children of subtrees n levels down (+nperc) § Adding structured attention helps only WQTC. On Yelp, WQ, WSJSO, there is no difference. On Debates, the attention hurts. Better Better Problem: Although RST trees are linguistically defined and motivated, they are hard to exploit because annotations are scarce and limited by genre. Do the induced trees learn discourse? Induced Tree: (same Yelp review as above) RST Dependency Tree: Worse Better On the upside they have a lot of classes. ELABORATION Yelp Rating […] They offer it at a great price. however the classes aren’t early enough. The big downside […] Evaluating Discourse in Structured Text Representations Elisa Ferracane, Greg Durrett, Jessy Li, Katrin Erk [email protected], [email protected], [email protected], [email protected] Structured Text Representations Induced Trees Experiments Yelp uuu, sterne, star , rating, deduct, 0, edit Debates oppose, republican, majority, thank WQ valley , mp3, firm, capital, universal RT 3 1 2 CONTRAST ELABORATION Structured text representations such as trees generated by Rhetorical Structure Theory (RST) are helpful for NLP end tasks including sentiment analysis. s 1 s 2 s t structured attention compose y max Solution…? Liu & Lapata (2018) train on text classification tasks and use structured attention to induce dependency trees over the text, akin to RST discourse dependency trees. Does structured attention help? Can we learn better structure? a. remove biLSTM over sentences (-biLSTM) 4 biLSTM over sentences discourse semantic d 1 d 2 d t e 1 e 2 e t e 1 ! e 2 ! e t ! Structured Attention score ij = bilinear(d i , d j ) context for possible immediate children of parent i updated semantic updated semantic vector RT 1 2 3 parent = RT 1 2 3 child= 1 2 3 4 RT 8 5 6 7 9 1 2 3 Model -4 -2 0 2 4 Yelp Debates WQ WQTC WSJSO Accuracy Difference Accuracy Gain with structured attention Setup: Train 4x with different random seeds and report mean (see paper for std dev, max) A s 1 s 2 s t Vacuous tree: Flat, uninformative discourse structure. Root is one sentence at beginning (or end) of text, and all other sentences are children. Best model produces less vacuous and deeper trees, but these are still far from ‘gold’ parsed RST discourse dependency trees. Dataset Label Task Yelp 1-5 Review sentiment Debates 0/1 Vote prediction Writing Quality (WQ) 0/1 Good writing WQ Topic-Controlled (WQTC) 0/1 Good writing (topic- controlled) WSJ Sentence Ordering (WSJSO) - Sentence order discrimination Generally, no. It only significantly helps one task. No. The model focuses on lexical cues. § Tree statistics: Root sentence analysis: Performance gain with structured attention: Top PPMI words in root sentence are indicative of label: rating or sentiment (Yelp), stance or politeness (Debates), topic (WQ). Model modifications to increase reliance on structure: A No. Model changes induce better trees, but are still far from discourse. § Performance: § Tree statistics: A embed sentences parent 1 2 3 1 2 3 child= matrix tree …can be interpreted as edge marginals of dependency structure Q Q Q Trees are very shallow; WQTC is the least shallow. Vacuous trees are frequent, except in WQTC. +1perc +2perc 81.1 77.8 77.1 81.7 74 78 82 Full -biLSTM -biLSTM, +1perc -biLSTM, +4perc Accuracy Better Better = RT d i e i c i = n X k=1 a ik e k e 0 i = tanh(W r [e i ,c i ]) a 12 a 13 a ij

Evaluating Discourse in Structured Text Representationselisaf.github.io/docs/acl2019-poster.pdf · Induced Tree: (same Yelp review as above) RST Dependency Tree: e r On the upside

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evaluating Discourse in Structured Text Representationselisaf.github.io/docs/acl2019-poster.pdf · Induced Tree: (same Yelp review as above) RST Dependency Tree: e r On the upside

= MatrixTree(scoreij)

§Tree analysis:

• Yang Liu and Mirella Lapata. 2018. ”Learning Structured Text Representations”. Transactions of the Association for Computational Linguistics, 6:63–75.

• design changes and bug fix are made to L&L code and results are also confirmed without these modifications; see paper for details.

• Code and data at https://github.com/elisaf/structured

05

1015202530

Full

-biLSTM

-biLSTM

, +1perc

-biLSTM

, +4perc

parsed RST

Tree

Hei

ght

012345

Yelp

DebatesWQ

WQTC

WSJSO

Tree

Hei

ght

§ 14%

4% 3% 3% 0%048

1216

Full

-biLSTM

-biLSTM

, +1perc

-biLSTM

, +4perc

parsed RST

% V

acuo

us T

rees

73%

38% 42%

14%

100%

020406080

100

Yelp

DebatesWQ

WQTC

WSJSO

% V

acuo

us T

rees

§

Removing biLSTM + adding 4 levels of children percolation yields similar performance as Full model.

b. percolate context from children of subtreesn levels down (+nperc)

§

Adding structured attention helps only WQTC. On Yelp, WQ, WSJSO, there is no difference. On Debates, the attention hurts.

Bett

er

Bett

erProblem: Although RST trees are linguistically defined and motivated, they are hard to exploit because annotations are scarce and limited by genre.

Do the induced trees learn discourse?

Induced Tree: (same Yelp review as above)

RST Dependency Tree:

Wor

seBe

tter

On the upside they have a lot of classes.

ELABORATION

Yelp Rating […]They offer it at a great price.

however the classes aren’t early enough.

The big downside […]

Evaluating Discourse in Structured Text Representations

Elisa Ferracane, Greg Durrett, Jessy Li, Katrin Erk [email protected], [email protected], [email protected], [email protected]

Structured Text Representations Induced Trees

Experiments

Yelp uuu, sterne, star, rating, deduct, 0, editDebates oppose, republican, majority, thankWQ valley, mp3, firm, capital, universal

RT

31 2

CONTRAST

ELABORATION

Structured text representations such as trees generated by Rhetorical Structure Theory (RST) are helpful for NLP end tasks including sentiment analysis.

s1 s2 st

structured attention

compose

y

max

Solution…? Liu & Lapata (2018) train on text classification tasks and use structured attention to induce dependency trees over the text, akin to RST discourse dependency trees.

Does structured attention help? Can we learn better structure?

a. remove biLSTM over sentences (-biLSTM)

4

biLSTM over sentences

discourse

semantic

d1 d2 dt

e1 e2 et

e1! e2

! et!

Structured Attention

scoreij = bilinear(di, dj)

context for possible immediate children of parent i

updatedsemantic

updated semantic vector

RT 1 2 3

parent= RT1 2 3child=

12

3

4

RT

85 6 7 91 2 3

Model

-4

-2

0

2

4

Yelp Debates WQ WQTC WSJSO

Accu

racy

Diff

eren

ce

Accuracy Gain with structured attention

Setup: Train 4x with different random seeds and report mean (see paper for std dev, max)

A

s1 s2 st

Vacuous tree: Flat, uninformative discourse structure. Root is one sentence at beginning (or end) of text, and all other sentences are children.

Best model produces less vacuous and deepertrees, but these are still far from ‘gold’ parsed RST discourse dependency trees.

Dataset Label TaskYelp 1-5 Review sentiment Debates 0/1 Vote predictionWriting Quality (WQ) 0/1 Good writingWQ Topic-Controlled (WQTC)

0/1 Good writing (topic-controlled)

WSJ Sentence Ordering (WSJSO)

- Sentence order discrimination

Generally, no. It only significantly helps one task.

No. The model focuses on lexical cues.

§

Tree statistics:

Root sentence analysis:

Performance gain with structured attention:

Top PPMI words in root sentence are indicative of label: rating or sentiment (Yelp), stance or politeness (Debates), topic (WQ).

Model modifications to increase reliance on structure:

A No. Model changes induce better trees, but are still far from discourse.

§ Performance:

§ Tree statistics:

✘ ✘

A

embed sentences

parent

12

3

1 2 3child=

matrix tree

…can be interpreted as edge marginals of dependency structure

Q Q Q

Trees are very shallow; WQTC is the least shallow.

Vacuous trees are frequent, except in WQTC.

+1perc

+2perc

81.1

77.877.1

81.7

74

78

82

Full -biLSTM -biLSTM,+1perc

-biLSTM,+4perc

Accu

racy

Bett

er

Bett

er

= RT

di

ei

ci =nX

k=1

aikek<latexit sha1_base64="6bhtbcxLqnBL6ziJQr/FOU0spDE=">AAACAnicdVDLSsNAFJ3UV62vqCtxM1gEVyGpim4KRTcuK9gHtDFMptN2yMwkzEyEEoIbf8WNC0Xc+hXu/BunD8HngQuHc+7l3nvChFGlXffdKszNLywuFZdLK6tr6xv25lZTxanEpIFjFst2iBRhVJCGppqRdiIJ4iEjrTA6H/utGyIVjcWVHiXE52ggaJ9ipI0U2Ds4oNWuSnmQRVUvvxYoyGiUkyAK7LLrVI5dA/ibeI47QRnMUA/st24vxiknQmOGlOp4bqL9DElNMSN5qZsqkiAcoQHpGCoQJ8rPJi/kcN8oPdiPpSmh4UT9OpEhrtSIh6aTIz1UP72x+JfXSXX/1M+oSFJNBJ4u6qcM6hiO84A9KgnWbGQIwpKaWyEeIomwNqmVTAifn8L/SbPieIdO5fKoXDubxVEEu2APHAAPnIAauAB10AAY3IJ78AierDvrwXq2XqatBWs2sw2+wXr9AJQyl44=</latexit>

e0i = tanh(Wr[ei, ci])<latexit sha1_base64="gxsyfbUlk+hWh/uoDmtmeO3tCUM=">AAACAHicdVDLSsNAFJ3UV62vqAsXbgaLWEFKUhXdCEU3LivYB6QhTKa3duhkEmYmQgnd+CtuXCji1s9w5984fQg+D1w4nHMv994TJpwp7TjvVm5mdm5+Ib9YWFpeWV2z1zcaKk4lhTqNeSxbIVHAmYC6ZppDK5FAopBDM+xfjPzmLUjFYnGtBwn4EbkRrMso0UYK7C3YC9iZJqJXagbSg4Ad0ID5+4FddMqVY8cA/yZu2RmjiKaoBfZbuxPTNAKhKSdKea6TaD8jUjPKYVhopwoSQvvkBjxDBYlA+dn4gSHeNUoHd2NpSmg8Vr9OZCRSahCFpjMiuqd+eiPxL89LdffUz5hIUg2CThZ1U451jEdp4A6TQDUfGEKoZOZWTHtEEqpNZgUTwuen+H/SqJTdw3Ll6qhYPZ/GkUfbaAeVkItOUBVdohqqI4qG6B49oifrznqwnq2XSWvOms5som+wXj8AWIOVmw==</latexit>

a12<latexit sha1_base64="yC+VtJgn7RYyni5ztIBy2ugLTqo=">AAAB7XicdVDJSgNBEK2JW4xb1KOXxiB4GmbGiB6DXjxGMAskQ+jp9CRterqH7h4hDPkHLx4U8er/ePNv7CyC64OCx3tVVNWLUs608bx3p7C0vLK6VlwvbWxube+Ud/eaWmaK0AaRXKp2hDXlTNCGYYbTdqooTiJOW9Hocuq37qjSTIobM05pmOCBYDEj2FipiXu5H0x65YrnBqeeBfpNfNeboQIL1Hvlt25fkiyhwhCOte74XmrCHCvDCKeTUjfTNMVkhAe0Y6nACdVhPrt2go6s0kexVLaEQTP160SOE63HSWQ7E2yG+qc3Ff/yOpmJz8OciTQzVJD5ojjjyEg0fR31maLE8LElmChmb0VkiBUmxgZUsiF8for+J83A9U/c4LpaqV0s4ijCARzCMfhwBjW4gjo0gMAt3MMjPDnSeXCenZd5a8FZzOzDNzivHz9Hjuo=</latexit>

a13<latexit sha1_base64="rRXMTZK+1S5MoH2KPAFbgpvR2T4=">AAAB7XicdVDLSgNBEOyNrxhfUY9eBoPgadlNFD0GvXiMYB6QLGF2MpuMmZ1ZZmaFsOQfvHhQxKv/482/cTaJ4LOgoajqprsrTDjTxvPencLS8srqWnG9tLG5tb1T3t1raZkqQptEcqk6IdaUM0GbhhlOO4miOA45bYfjy9xv31GlmRQ3ZpLQIMZDwSJGsLFSC/czvzbtlyueWz31LNBv4rveDBVYoNEvv/UGkqQxFYZwrHXX9xITZFgZRjidlnqppgkmYzykXUsFjqkOstm1U3RklQGKpLIlDJqpXycyHGs9iUPbGWMz0j+9XPzL66YmOg8yJpLUUEHmi6KUIyNR/joaMEWJ4RNLMFHM3orICCtMjA2oZEP4/BT9T1pV16+51euTSv1iEUcRDuAQjsGHM6jDFTSgCQRu4R4e4cmRzoPz7LzMWwvOYmYfvsF5/QBAzI7r</latexit>

aij<latexit sha1_base64="TFdJkW0yRCqwMkcQJ7NbfedU2aI=">AAAB7XicdVDLSgNBEOyNrxhfUY9eBoPgadmNih6DXjxGMA9IljA7mU0mmZ1ZZmaFsOQfvHhQxKv/482/cTaJ4LOgoajqprsrTDjTxvPencLS8srqWnG9tLG5tb1T3t1rapkqQhtEcqnaIdaUM0EbhhlO24miOA45bYXjq9xv3VGlmRS3ZpLQIMYDwSJGsLFSE/cyNpr2yhXPrZ55Fug38V1vhgosUO+V37p9SdKYCkM41rrje4kJMqwMI5xOS91U0wSTMR7QjqUCx1QH2ezaKTqySh9FUtkSBs3UrxMZjrWexKHtjLEZ6p9eLv7ldVITXQQZE0lqqCDzRVHKkZEofx31maLE8IklmChmb0VkiBUmxgZUsiF8for+J82q65+41ZvTSu1yEUcRDuAQjsGHc6jBNdShAQRGcA+P8ORI58F5dl7mrQVnMbMP3+C8fgDpr49a</latexit>