Headline Generation Track · Textrank: Bringing order into text. In Proceedings of the 2004...

Headline Generation Trackat Dialogue’2019

Organizers VK.com:

● Valentin Malykh, Pavel Kalaidin● Ivan Karabakin, Irina Shubina

With the help of:

● Ivan Smurov, ABBYY● Ekaterina Artemova, HSE

vk.com/deepvk

Summarization Task● Sentence Summarization

○ to produce more concise sentences

● Text Summarization○ to produce shorter texts

Summarization Task● Extractive Summarization

○ to take some phrases from a text

● Abstractive Summarization○ to generate a new text basing on bigger one

Extractive Summarization● To take some phrases from a text

Supervised and Unsupervised:

● We have some gold markup of taken phrases.● And there are no such markup.

Common Approaches to Ext. Sum.TextRank

LexRank

Common Approaches Supervised Approaches

Wong KF, Wu M, Li W. Extractive summarization using supervised and semi-supervised learning. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1 2008 Aug 18 (pp. 985-992). Association for Computational Linguistics.

Abstractive SummarizationThe bigger test is paraphrased to smaller one.

It is common, that bigger (original) and smaller (summary) texts are human-generated.

Supervised and Unsupervised

Direct Approaches to Abs. Sum.Common approaches:

● BiRNN● CNN● Transformer● Pointer-Generator

● Reinforcement Learning

Direct Approaches: Pointer-Generation

Direct Approaches: Reinforcement Learning

Direct Approaches: Multi-Agent Sum.

Direct Approaches: Results

Common Approaches to Abs. Sum.Indirect approaches:

● 5W1H● First Sentence● Topic-sentence● Unsupervised Extraction Summary-based● etc.

Unsupervised Abstractive Summarization

Datasets● DUC 2001-2007

○ hundreds of documents each

● CNN / Dailymail○ 287226 articles for training, 13368 for validation, and 11490 for test○ 781 token on average for article, 56 tokens for a summary

● New York Times Annotated○ 1444919 articles○ 708 tokens for an article, 8 tokens for a headline

● Rossiya Segodnya News ○ 1003869 articles○ 316 tokens for an article, 10 tokens for a headline

aryheadline

Track Datasets● Rossiya Segodnya News Dataset

○ 1m of news documents○ 1 news agency

● ROMIP News Collection○ 32k of news documents○ 16k has been used to compute public score, and the rest - the private one○ 25 different news agencies

Metrics: METEOR

Metrics: ROUGE

Track MetricThere are 9 different variants of ROUGE. We take F-score ones and mean them.

Platform● Docker● 1 GPU● 16 Gb RAM● 2 vCPU● private docker registry

A solution has been run on private test set of size 16k.

Competition Statistics

● 15 registered participants● 6 participants who made at least 1 submit● 258 submits in total (~100 testing submits)● 3 participants who beat the baseline

Results

ReferencesPaulus, R., Xiong, C. and Socher, R., 2017. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304.

Wong KF, Wu M, Li W. Extractive summarization using supervised and semi-supervised learning. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1 2008 Aug 18 (pp. 985-992). Association for Computational Linguistics.

See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368.

Chu, E., & Liu, P. J. (2018). Unsupervised Neural Multi-Document Abstractive Summarization of Reviews.

Mihalcea, R. and Tarau, P., 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing.

Celikyilmaz, A., Bosselut, A., He, X., & Choi, Y. (2018). Deep communicating agents for abstractive summarization. arXiv preprint arXiv:1803.10357.

Grusky, M., Naaman, M., & Artzi, Y. (2018). Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. arXiv preprint arXiv:1804.11283.

H.T. Dang. Overview of DUC 2006. National Institute of Standards and Technology (NIST)

Thank you for your attention!

Applied Research @ VK.com vk.com/deepvkValentin Malykh, val.maly.hk

Headline Generation Track · Textrank: Bringing order into text. In Proceedings of the 2004...

Documents

Proceedings of - TerpConnectterpconnect.umd.edu/~skgupta/Publication/DETC04_Xu.pdf · Proceedings of DETC’04 ASME 2004 Design Engineering Technical Conferences and Computers and

IMPLEMENTASI METODE TEXTRANK DAN MAXIMUM MARGINAL

2004 Challenges Construction Industry Proceedings

2004 Conference Proceedings

OFFICIAL RECORD OF PROCEEDINGS Thursday, 22 April 2004 … · official record of proceedings thursday, 22 april 2004 the council continued to meet at half-past two o'clock members

Proceedings of the 2004 Workshop on CFD Validation of Synthetic … · 2012-07-31 · April 2007 NASA/CP-2007-214874 Proceedings of the 2004 Workshop on CFD Validation of Synthetic

2004 NZ-INTIMATE Proceedings...Australasian Quaternary Association (AQUA) Sponsored by 2004 NZ-INTIMATE* Meeting GNS-Rafter Laboratory, Wellington: 23rd & 24th August, 2004 Convener

2004 IEEE International - GBV · 2004IEEEInternational Conference onAcoustics,Speech, andSignalProcessing Proceedings VolumeIIofV SensorArrayandMultichannelSignalProcessing

Proceedings of 2004 CANBERRA FORUM - Introduction to the RMA

Effects of overlaying ontologies to TextRank graphs

PROCEEDINGS SENTENCING PROCEEDINGS IN AUSTRALIA … · PROCEEDINGS PROCEEDINGS PROCEEDINGS PROCEEDINGS PROCEEDINGS PROCEEDINGS PROCEEDINGS PROCEEDINGS PROCEEDINGS PROCEEDINGS PROCEEDINGS

Proceedings of the 2004 IEEE International Conference on ...Proceedings of the 2004 IEEE International Conference on Control Applications Volume II Pages 849-1762 September 2-4, 2004

Grand York Rite of Florida - 2004 PROCEEDINGS …...GRAND YORK RITE OF FLORIDA – 2004 PROCEEDINGS Page 5 of 237 MASONIC RESUME OF H. WARREN ALMAND, JR. M.E.G.H.P. FOR 2003 – 2004

A — The Court of Justice in 2004: changes and proceedings

The Proceedings of The Fourteenth (2004) International ...legacy.isope.org/publications/proceedings/ISOPE/ISOPE 2004/Table of... · The publisher and the editors of its publications

TextRank: Bringing Order into Texts

55514 2004-2005 Proceedings

PROCEEDINGS OF THE 11TH PHDMINI-SYMPOSIUMoldweb.mit.bme.hu/events/minisy2004/proceedings.pdf · 2004-01-16 · proceedings of the 11th phdmini-symposium february3–4, 2004. budapestuniversity

August 17, 2004 15:58 Proceedings Trim Size

ACSE 2004 PROCEEDINGS