ИСПОЛЬЗОВАНИЕ ПОДХОДА GRAPH2VEC ДЛЯ ЗАДАЧ NLP · 2019-12-07 · Graph2Vec •On 13thInternational Workshop on Mining and Learning with Graphs, 2017 was

ИСПОЛЬЗОВАНИЕ ПОДХОДА GRAPH2VEC ДЛЯ ЗАДАЧ NLP

Oleg DurandinEPAM Systems, Data ScientistHSE, Applied Mathematics department

Nizhny Novgorod, 2019

Agenda

• Embeddings 101 : Word2Vec• Doc2Vec• Graph2Vec• Dependency tree• DGraph2Vec for NLP tasks

Embeddings

• Techniques for vector representations (embeddings) and representation learning have been increasingly attracting attention nowadays.

• Initially, used in NLP for word representations they saw widespread application.

Embeddings 101 : Word2Vec

• Концепция векторных репрезентаций известна очень давно и тесно пересекается с понятием т.н. дистрибутивной семантики.

«A word is characterized by the company it keeps»Firth (1957)

• Термин word embeddings впервые был использован Bengio et al. в 2003 году.• Широкое распространение эмбедднги получили лишь в 2013 году, когда T.

Mikolov предложил подход word2vec, поскольку все предыдущие подходы были чрезвычайно вычислительно затратными.

§ T. Mikolov et al. Efficient Estimation of Word Representations in Vector Space (https://arxiv.org/pdf/1301.3781.pdf)§ T. Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality

(https://arxiv.org/abs/1310.4546)

https://arxiv.org/pdf/1301.3781.pdf

https://arxiv.org/abs/1310.4546

Embeddings 101 : Word2Vec

• Нашумевшая модель word2vec предельно проста – мы предсказываем вероятность слова по его окружению (контексту), или наоборот.

• Семейство word2vec представлено двумя подходами:• Skip-Gram – по слову предсказать контекст;• CBOW (Continuous Bag of Words) – по контексту предсказать слово;

Embeddings 101 : Word2Vec Properties• Векторы, соответствующие словам корпуса образуют

семантическое пространство.• Похожие слова располагаются рядом друг с другом в этом

пространстве

Embeddings 101 : Word2Vec Properties• Векторы, соответствующие словам корпуса образуют

семантическое пространство.• Похожие слова располагаются рядом друг с другом в этом

пространстве • Мы можем применять простые алгебраические операции к этим векторам (сложение, вычитание, поиск среднего для группы слов).

• Эти операции отражают семантические отношения между словами.

Embeddings 101 : Word2Vec Properties

𝐹𝑒𝑚$%& = 𝑉𝑒𝑐 𝑔𝑖𝑟𝑙 − 𝑉𝑒𝑐 𝑏𝑜𝑦𝑉𝑒𝑐 𝑁𝑖𝑒𝑐𝑒 = 𝑉𝑒𝑐 𝑁𝑒𝑝ℎ𝑒𝑤 + 𝐹𝑒𝑚$%&







https://algorithmia.com/algorithms/nlp/Word2Vec/docs

https://algorithmia.com/algorithms/nlp/Word2Vec/docs

Embeddings 101 : Word2Vec Properties• Важнейшие свойства:• Семантическое пространство• Векторная арифметика

• Возникли сами собой, как побочный эффект.

Embeddings 101 : Embedding Properties• Важнейшие свойства:• Семантическое пространство• Векторная арифметика

• Возникли сами собой, как побочный эффект.

Векторные репрезентации не просто отражают слово в вектор вещественных чисел, но также захватывают информацию на разных уровнях(морфология/синтаксис/семантика/контекстуальная информация).

Embeddings 101 : Embedding Properties• Векторные репрезентации не просто отражают слово в вектор вещественных

чисел, но также захватывают информацию на разных уровнях(морфология/синтаксис/семантика/контекстуальная информация).

Что могут отражать векторные репрезентации?

Структуру слова в терминах морфологии

Репрезентацию «слово-контекст» Иерархию слов в терминологии

онтологии WordNet

Embeddings 101 : Go Further• А что если хотим представить более крупные блоки (например,

параграфы, целые документы)? • Как обобщить концепцию word2vec?

Embeddings 101 : doc2vec• Doc2vec – сравнительно эффективный метод репрезентации

документов.

• PV-DM (Distributed Memory of Paragraph Vector) acts as a memory that remembers what is missing from the current context – or as the topic of paragraph. Add another feature vector, that document-unique and trained simultaneously W and D.

• PV-DBOW (Distributed Bag of Words version of Paragraph Vectors)

Embeddings 101 : doc2vec• Doc2vec – сравнительно эффективный метод репрезентации

документов.

Embeddings 101 : more complex structures• 𝐺 = 𝑉, 𝐸• 𝐺 – граф, 𝑉 – множество вершин (vertices, nodes), 𝐸 –

множество ребер (edges, links).

• Графовые эмбеддинги (graph embeddings) представляют собой преобразование графа в вектор (множество векторов);

• Эмбеддинги должны захватывать топологию графа (взаимоотношения между вершинами, релевантные особенности для графа: подграфы, соседство вершин и т.д.);

Embeddings 101 : more complex structures• 𝐺 = 𝑉, 𝐸• 𝐺 – граф, 𝑉 – множество вершин (vertices, nodes), 𝐸 –

множество ребер (edges, links).

• Графовые эмбеддинги (graph embeddings) представляют собой преобразование графа в вектор (множество векторов);

• Эмбеддинги должны захватывать топологию графа (взаимоотношения между вершинами, релевантные особенности для графа: подграфы, соседство вершин и т.д.);

• Графовые эмбеддинги можно разделить на 2 группы:• Vertex embeddings – каждая вершина кодируется своим вектором;• Graph embeddings – репрезентация всего графа в виде единого вектора;

Graph2Vec• On 13th International Workshop on Mining and Learning with Graphs, 2017 was proposed a neural

embedding framework named graph2vec to learn data-driven distributed representations of arbitrary sized graphs.

• Graph2vec's embeddings are learnt in an unsupervised manner and are task agnostic. Hence, they could be used for any downstream task such as graph classification, clustering and even seeding supervised representation learning approaches.

A. Narayanan et al. graph2vec: Learning distributed representations of graphs (https://arxiv.org/pdf/1707.05005.pdf). 13th International Workshop on Mining and Learning with Graphs (MLGWorkshop 2017).


Graph2Vec

• doc2vec’s skipgram model - Given a document 𝑑, it samples 𝑐 words from 𝑑 and considers them as co-occurring in the same context (i.e., context of the document 𝑑) uses them to learn 𝑑’s representation.

• graph2vec - Given a graph 𝐺, it samples 𝑐 rooted subgraphs around different nodes that occur in 𝐺and uses them analogous to doc2vec’s context words and thus learns 𝐺’s representation.



Graph2Vec: Kernel concept

Process of extraction rooted subgraphs execution with so called kernel.



Graph2Vec: Kernel concept

Process of extraction rooted subgraphs execution with so called kernel.

• In paper Weisfeiler-Lehman Graph Kernel (WL-Kernel) was used (N. Shervashidze et al., «Weisfeiler-Lehman Graph Kernels», 2011).

• WL kernel based on the idea of extraction subtree patterns.



Dependency Tree Concept

• Dependency tree is a representation of syntactic structure. A dependency tree for a sentence is a directed acyclic graph with words as nodes and relations as edges. Each word in the sentence either modifies another word or is modified by a word.

• Properties of tree: connected, acyclic and oriented graph, where each node have a label (word itself or POS-tag). We also will consider the label of edge – dependency type, between connected words.

• We’d like to abstract from lexical level and concentrate on the morpho-syntax only, so we’ll consider as a nodes only PoS-tags.

Methods of Dependency Tree Representation

•Standard WL-kernel from graph2vec not working with dependency tree:

Ø Assumption about uniqueness of the labels (in the same sentence can be a lot of NOUNs);

Ø Do not take into account edge labels;

Ø Standard WL-kernel procedure consider general purpose graphs, and order of nodes is not important.

Methods of Dependency Tree Representation

•We proposed 3 novel kernels, that take into account linguistic properties of dependency trees:

1. Relaxation of WL-kernel;

2. Contraction kernel;

3. Simple Paths approach;

First Experiment: Explore semantic space

0,785

0,79

0,795

0,8

0,805

0,81

0,815

WL Kernel with dep WL Kernel w/o dep Contracted with Dep Contracted w/o Dep Path Extractor

Hopkins stat (EWT)

0,735

0,74

0,745

0,75

0,755

0,76

0,765

0,77

0,775

0,78

WL Kernel with dep WL Kernel w/o dep Contracted with Dep Contracted w/o Dep Path Extractor

Hopkins stat (SyntagRus corpora)

• English Language: English Web Treebank ( > 16 thousands sentences)

• Russian Language: SynTagRus ( > 66 thousands sentences);

Universal Dependencies project (https://universaldependencies.org/)

Properties of semantic space• We collected all syntactic trees from corpus and translate them to vector

representation (with graph2vec)

• Set of these vectors composed embedding’s space.

• Analysis with Hopkins statistics reveals that for different languages most suitable different kernels.

https://en.wikipedia.org/wiki/Hopkins_statistichttps://stats.stackexchange.com/questions/332651/validating-cluster-tendency-using-hopkins-statistic

https://universaldependencies.org/

https://en.wikipedia.org/wiki/Hopkins_statistic

https://stats.stackexchange.com/questions/332651/validating-cluster-tendency-using-hopkins-statistic

Clustering Experiment1. We normalize vectors.

2. Clustering was performed with standard Kmeans++ algorithm (other options, e.g. HDBScan algorithm reveal poor quality).

Ø Number of clusters was selected from silhouette analysis method.

Ø Centroids of the clusters were chosen and selected 20 nearest embeddings in original embedding space (with cosine distance metrics).

Clustering Experiment: simple path extractor (SyntagRUS)1) Complex sentences with relative clauses; 2) Simple sentences with one indirect object; 3) Elliptical (incomplete) sentences without verbs; 4) Elliptical (incomplete) sentences with verbs; 5) Simple sentences with an infinitive and one or more adverb; 6) Complex sentences with adverbial clauses and at least one numeral; 7) Incomplete sentences with proper nouns; 8) Complex sentences that start with a verb and have a noun clause as the object of this verb (importantly, the clause includes a verb and no adjectives, unlike Cluster 11); 9) Sentences containing at least one adjective that is a direct dependent of a verb (as in become silent); 10) Sentences with homogeneous objects (direct or indirect), expressed by nouns;

11) Complex sentences that start with a verb and have a noun clause as the object of this verb (the clause includes an adjective and no verbs, cf. Cluster 8); 12) Complex sentences with noun clauses that have an infinitive; 13) Sentences containing at least one adjective that is a direct dependent of a noun (as in electronic microscope); 14) Sentences with homogeneous subjects, expressed by nouns; 15) Sentences with a noun dependent on another noun (as in prichiny padeniya, which is equivalent to the reasons for the fall); 16) Short simple sentences with an auxiliary verb (e.g. byl = was) and an adjective that is its dependent, as well as sentences where the auxiliary verb is omitted (but is assumed); 17) Sentences with homogeneous predicates, expressed by verbs; 18) Sentences with a participle clause; 19) Sentences with adverbial and adverbial participle clauses; 20) Sentences with multiple pronouns.

Clustering Experiment: simple path extractor (SyntagRUS)1) Complex sentences with relative clauses; 2) Simple sentences with one indirect object; 3) Elliptical (incomplete) sentences without verbs; 4) Elliptical (incomplete) sentences with verbs; 5) Simple sentences with an infinitive and one or more adverb; 6) Complex sentences with adverbial clauses and at least one numeral; 7) Incomplete sentences with proper nouns; 8) Complex sentences that start with a verb and have a noun clause as the object of this verb (importantly, the clause includes a verb and no adjectives, unlike Cluster 11); 9) Sentences containing at least one adjective that is a direct dependent of a verb (as in become silent); 10) Sentences with homogeneous objects (direct or indirect), expressed by nouns;

11) Complex sentences that start with a verb and have a noun clause as the object of this verb (the clause includes an adjective and no verbs, cf. Cluster 8); 12) Complex sentences with noun clauses that have an infinitive; 13) Sentences containing at least one adjective that is a direct dependent of a noun (as in electronic microscope); 14) Sentences with homogeneous subjects, expressed by nouns; 15) Sentences with a noun dependent on another noun (as in prichiny padeniya, which is equivalent to the reasons for the fall); 16) Short simple sentences with an auxiliary verb (e.g. byl = was) and an adjective that is its dependent, as well as sentences where the auxiliary verb is omitted (but is assumed); 17) Sentences with homogeneous predicates, expressed by verbs; 18) Sentences with a participle clause; 19) Sentences with adverbial and adverbial participle clauses; 20) Sentences with multiple pronouns.

Twenty clusters contain sentences of distinct syntactic structures. Sentences with similar syntactic structures have similar syntactic meanings or describe different situations in a similar way.

Clustering Experiment: simple path extractor (SyntagRUS)

The twenty clusters contain sentences of distinct syntactic structures. Sentences with similar syntactic structures have similarsyntactic meanings or describe different situations in a similar way.

Next 2 sentence are completely unrelated semantically, yet the are similar in syntactic structure:• В ход идут коробки из-под фруктов, тележки из супермаркетов, старые ро-лики, инвалидные коляски, велосипеды.

(Fruit boxes, supermarket carts, old rollers, wheelchairs, bicycles are used.).

Посыпались жалобы, протесты (Complaints and protests flowed).

Clustering Experiment: Relaxed WL Extractor (SyntagRUS)

1) Simple sentences that have adverbs, usually as direct dependents of verbs;

2) Simple sentences containing numerals; 3) Sentences with pronouns and nouns;

4) Short incomplete sentences consisting of proper nouns only (as in: Irina Melni-kova.);

5) Sentences with nouns functioning as homogeneous subjects or objects, always expressed by nouns;

6) Short incomplete sentences consisting of noun phrases, without proper nouns (as in: Уроки итальянского. = Lessons of Italian.);

7) Sentences with homogeneous attributes expressed by adjectives; 8) Complex sentences with noun clauses as objects;

9) Sentences with particles, including the negative particle ne (=not); 10) Complete sentences that include proper nouns (cf. Cluster 4);

11) (Mostly) complete sentences consisting only of verbs, adverbs and pronouns (but not nouns like Cluster 3);

12) Sentences with noun phrases, but no proper nouns; 13) Sentences with proper nouns and numerals; 14) Sentences with determiners (e.g. etot = this).

• As can be seen, different clusterings capture different structural types of sentences. • A relatively larger number of clusters results in a more detailed syntax model in comparison with a smaller

number, as shown by the above clusterings

Second Experiment: Authorship Attribution taskToolbox:• SpaCy library (https://spacy.io/) as convenient NLP pipeline (word and sentence tokenizer, morpho-syntactic analysis etc.)• UDPipe as syntax analyzer for Russian language• spaCy+UDPipe convenient wrapper https://github.com/TakeLab/spacy-udpipe• PyMorphy2 – Morphological analyzer/inflection engine for Russian/Ukrainian languages

Data Collection:215 works of Russian literature, created by 30 authors spanning XVIII-XXI centures:We used works by Andreev, Astafiev, Bianchi, Bulgakov, Bulychev, Bunin, Vasilyev, Gogol, Goncharov, Gorky, Dostoyevsky, Zhitkov, Zamyatin, Karamzin, Lermontov, Lukyanenko, Nabokov, Nosov, Platonov, Prishvin, Pushkin, Rasputin, Skrebitskiy, Solzhenitsyn, Sologub, Tolstoy, Turgenev, Chernyshevsky, Chekhov, Sholokhov.

Principles of selecting material:• The selected authors have played significant role in Russian literature.• Text written in modern Russian.• Each author’s works were selected in such a way that they covered only one approximate period of the writer’s creative life.

This was done to minimize the changes in the same writer’s style throughout his life.

https://spacy.io/

https://github.com/TakeLab/spacy-udpipe

Second Experiment: Authorship Attribution task

• Authorship Attribution – as a classification task. We’ll consider author’s text as a sequence of tree embeddings;• We used as algorithm RCNN model (https://github.com/roomylee/rcnn-text-classification)

https://github.com/roomylee/rcnn-text-classification

In according to Johannsen et. al. «Cross-lingual syntactic variation over age and gender», we can use text representation in the form of treelets, i.e. pairs and triples of tokens and their relations.

Bigram treeletsDependency between head and dependency words: VERB → nsubj → NOUN

Trigram treelets (really huge of feature space)• Words stems from one head word: NOUN ← VERB → NOUN• Words in the chain sequential subordination: VERB → NOUN → PRON

We just write this sequences in one line and represent it as document.


Embedding Accuracy (100) Accuracy (150) Accuracy (180) Accuracy (200) Accuracy (220) Accuracy (250)WL 0.42049 +/- 0.00712 0.47106 +/- 0.00107 0.48553 +/- 0.01962 0.50418 +/- 0.00309 0.52226 +/- 0.00354 0.53199 +/- 0.00465SP 0.37977 +/- 0.00964 0.42631 +/- 0.00870 0.43129 +/- 0.02344 0.44958 +/- 0.00754 0.47956 +/- 0.00282 0.50437 +/- 0.00009

Contracted 0.33655 +/- 0.00443 0.37349 +/- 0.01500 0.36115 +/- 0.00408 0.40025 +/- 0.00981 0.40307 +/- 0.01463 0.41443 +/- 0.007441-treelet 0.37266 +/- 0.00634 0.42489 +/- 0.00657 0.43635 +/- 0.00009 0.44213 +/- 0.01245 0.44813 +/- 0.00518 0.48503 +/- 0.001402-treelet 0.40104 +/- 0.00017 0.47008 +/- 0.01110 0.44753 +/- 0.00257 0.46621 +/- 0.00145 0.50027 +/- 0.00354 0.52316 +/- 0.003443-treelet 0.45321 +/- 0.01866 0.50568 +/- 0.00426 0.52024 +/- 0.00817 0.52889 +/- 0.00127 0.53361 +/- 0.01199 0.55897 +/- 0.02158

23-treelet 0.44757 +/- 0.01181 0.49867 +/- 0.02157 0.49352 +/- 0.00826 0.50282 +/- 0.00191 0.54642 +/- 0.00881 0.53841 +/- 0.01051Morphology 0.32639 +/- 0.04062 0.38477 +/- 0.00639 0.37757 +/- 0.00204 0.33621 +/- 0.01154 0.38990 +/- 0.00200 0.36868 +/- 0.01935

Syntax 0.40087 +/- 0.03333 0.44558 +/- 0.00382 0.41584 +/- 0.02805 0.39998 +/- 0.01717 0.38854 +/- 0.00209 0.49368 +/- 0.00446Morphosyntax 0.45365 +/- 0.01233 0.50648 +/- 0.00453 0.51367 +/- 0.00195 0.46066 +/- 0.03170 0. 45061 +/- 0.00500 0.46726 +/- 0.03069


Third Experiment: Authorship Attribution task (PAN-12)

https://pan.webis.de/clef12/pan12-web/author-identification.html

Embedding Accuracy (150) Accuracy (200)WL 0.41784 +/- 0.16246 0.52339 +/- 0.06881SP 0.58400 +/- 0.09960 0.57022 +/- 0.04236

Contracted 0.36531 +/- 0.22954 0.43763 +/- 0.031701-treelet 0.15453 +/- 0.08398 0.18147 +/- 0.015582-treelet 0.20790 +/- 0.03867 0.36187 +/- 0.086123-treelet 0.59610 +/- 0.07268 0.55185 +/- 0.05683

23-treelet 0.28683 +/- 0.18172 0.41990 +/- 0.16644Morphology 0.63223 +/- 0.09619 0.63146 +/- 0.09505

Syntax 0.57351 +/- 0.06970 0.59095 +/- 0.04328Morphosyntax 0.53007 +/- 0.04977 0.60490 +/- 0.03999

https://pan.webis.de/clef12/pan12-web/author-identification.html

Conclusion• We build semantic space for dependency trees;• Constucted vector space for SyntagRus and EWT treebanks. Carried out clustering of this space;• From linguistic viewpoint were revealed that different clustering capture different structural

types of sentences, with relatively larger numbers of clusters resulting in more detailed syntax model

• Proposed method completely isolate the morpho-syntax layer of language system, considering it separately from the lexis and semantic layer.

Documents

ИСПОЛЬЗОВАНИЕ ПОДХОДА GRAPH2VEC ДЛЯ ЗАДАЧ NLP · 2019-12-07 · Graph2Vec •On 13thInternational Workshop on Mining and Learning with Graphs, 2017 was