67
Uncertainty and Robustness of Graph Embeddings Aleksandar Bojchevski Technical University of Munich, Germany Graph Embedding Day 2018 - Lyon

Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Uncertainty and Robustnessof Graph Embeddings

Aleksandar Bojchevski

Technical University of Munich, Germany

Graph Embedding Day 2018 - Lyon

Page 2: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Neglected aspects of graph embeddings

Capturing uncertainty

Robustness to noise

Robustness to adversarial attacks

Uncertainty and Robustness of Graph Embeddings - Bojchevski 2

Page 3: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Neglected aspects of graph embeddings

Capturing uncertainty

Robustness to noise

Robustness to adversarial attacks

Uncertainty and Robustness of Graph Embeddings - Bojchevski 3

Page 4: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Nodes are points in a low-dimensional space

Uncertainty and Robustness of Graph Embeddings - Bojchevski 4

Page 5: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Nodes are distributions

Uncertainty and Robustness of Graph Embeddings - Bojchevski 5

Page 6: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Graph2Gauss - 3 key modeling ideas

Uncertainty and Robustness of Graph Embeddings - Bojchevski 6

𝑘 = 1

𝑘 = 2

1. Uncertainty 2. Personalized ranking 3. Inductiveness

𝒩( 𝜇𝑖 , Σ𝑖)

𝑥𝑖

𝑓𝜃(𝑥𝑖) deepencoder

Page 7: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Uncertainty

Embed nodes as (Gaussian) distributions

Sources of uncertainty:

• Conflicting structure and attributes

• Heterogenous neighborhood

• Noise, outliers, anomalies, ….

Uncertainty and Robustness of Graph Embeddings - Bojchevski 7

Page 8: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Personalized ranking

For each node 𝑖: nodes in its (𝑘)-hop neighborhood

should be closer to 𝑖 compared to nodes

in its (𝑘 + 1)-hop neighborhood

Uncertainty and Robustness of Graph Embeddings - Bojchevski 8

𝑘 = 1

𝑘 = 2

Page 9: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Personalized ranking

For each node 𝑖: nodes in its (𝑘)-hop neighborhood

should be closer to 𝑖 compared to nodes

in its (𝑘 + 1)-hop neighborhood

Uncertainty and Robustness of Graph Embeddings - Bojchevski 9

𝑘 = 1

𝑘 = 2

Page 10: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Personalized ranking

For each node 𝑖: nodes in its (𝑘)-hop neighborhood

should be closer to 𝑖 compared to nodes

in its (𝑘 + 1)-hop neighborhood

Example: closer in terms of the KL Diveregence

KL is asymmetric ⇒ handles directed graphs

Uncertainty and Robustness of Graph Embeddings - Bojchevski 10

𝑘 = 1

𝑘 = 2

Page 11: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Personalized ranking

Personalized ranking implies

pairwise constraints for node 𝑖

D𝐾𝐿(𝒩𝑗||𝒩𝑖) < D𝐾𝐿 (𝒩𝑗′||𝒩𝑖)

∀𝑗 ∈ 𝑁𝑖(𝑘), ∀𝑗′ ∈ 𝑁𝑖

(𝑘′), ∀𝑘 < 𝑘′

Uncertainty and Robustness of Graph Embeddings - Bojchevski 11

𝑘 = 1

𝑘 = 2

set of nodes in the 𝑘-hop neighborhood of node 𝑖

Page 12: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Inductiveness

Generalize to unseen nodes by learning

a mapping from features to embeddings

Uncertainty and Robustness of Graph Embeddings - Bojchevski 12

𝒩( 𝜇𝑖 , Σ𝑖)

𝑥𝑖

𝑓𝜃(𝑥𝑖) deepencoder

Page 13: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Graph2Gauss - 3 key modeling ideas

Uncertainty and Robustness of Graph Embeddings - Bojchevski 13

𝑘 = 1

𝑘 = 2

1. Uncertainty 2. Personalized ranking 3. Inductiveness

𝒩( 𝜇𝑖 , Σ𝑖)

𝑥𝑖

𝑓𝜃(𝑥𝑖) deepencoder

Page 14: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Learning with energy-based loss

𝐸𝑖𝑗 = D𝐾𝐿(𝒩𝑗| 𝒩𝑖 ℒ = σ 𝑖,𝑗,𝑗′ (𝐸𝑖𝑗2 + exp

−𝐸𝑖𝑗′)

Closer nodes should have lower energy

Naively: 𝑂(𝑁3) complexity

Node-anchored sampling strategy:• For each node same one another node from every neighborhood

• Less than 4.2% triplets seen to match performance

• Lower gradient variance

Uncertainty and Robustness of Graph Embeddings - Bojchevski 14

Page 15: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Graph2Gauss is parameter/data efficient

15Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 16: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Graph2Gauss captures uncertainty

Uncertainty correlates with diversity

Diversity: number of distinct classes

in a node’s k-hop neighborhood

Uncertainty and Robustness of Graph Embeddings - Bojchevski 16

Page 17: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Graph2Gauss captures uncertainty

Uncertainty reveals the

intrinsic latent dimensionality of the graph

Detected latent dimensions

≈ number ground-truth communities

Uncertainty and Robustness of Graph Embeddings - Bojchevski 17

Page 18: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Uncertainty and link prediction

Prune dimensions with high uncertainty

Maintaining link prediction performance

Uncertainty and Robustness of Graph Embeddings - Bojchevski 18

Page 19: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Graph2Gauss is effective for visualization

Uncertainty and Robustness of Graph Embeddings - Bojchevski 19

Page 20: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Neglected aspects of graph embeddings

Capturing uncertainty

Robustness to noise

Robustness to adversarial attacks

Uncertainty and Robustness of Graph Embeddings - Bojchevski 20

Page 21: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Why spectral embedding

Uncertainty and Robustness of Graph Embeddings - Bojchevski 21

https://www.semanticscholar.org

Page 22: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

What is spectral clustering

Graph clustering• Maximize within-cluster edges

• Minimize between cluster edges

22

SimilarityGraph

SpectralEmbedding

𝑁=9

𝐷 = 5

-5

-4

-3

-2

-1

0

1

2

3

4

-4 -2 0 2 4

-5

-4

-3

-2

-1

0

1

2

3

4

-4 -2 0 2 4

Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 23: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

The minimum cut

Partition V into two sets 𝐶1 and 𝐶2, such that the sum of the inter-cluster edge weights cut 𝐶1, 𝐶2 = σ𝑣1∈𝐶1,𝑣2∈𝐶2

𝑤(𝑣1, 𝑣2) is minimized

Drawbacks:• Tends to cut small vertex sets from the rest of the graph

• Considers only inter-cluster edges, no intra-cluster edges

Uncertainty and Robustness of Graph Embeddings - Bojchevski 23

0

1 2

5

3 4

2

4

2

4

4

2

2

31

Page 24: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

The normalized cut

Ratio Cut: Minimize 𝑐𝑢𝑡(𝐶1,𝐶2)

|𝐶1|+

𝑐𝑢𝑡(𝐶2,𝐶1)

|𝐶2|

Normalized Cut: Minimize 𝑐𝑢𝑡(𝐶1,𝐶2)

vol(𝐶1)+

𝑐𝑢𝑡(𝐶1,𝐶2)

vol(𝐶2)

Uncertainty and Robustness of Graph Embeddings - Bojchevski 24

0

1 2

5

3 4

2

4

2

4

4

2

2

31 0

1 2

5

3 4

2

4

2

4

4

2

2

31

Page 25: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Multi-way graph partitioning

Generalization to 𝑘 ≥ 2 clusters

Partition V into disjoint clusters 𝐶1, … , 𝐶𝑘 such that• Cut: min

C1,…,Ckσ𝑖=1𝑘 𝑐𝑢𝑡(𝐶i, V\𝐶i)

• Ratio Cut: minC1,…,Ck

σ𝑖=1𝑘 𝑐𝑢𝑡(𝐶i,V\𝐶i)

|𝐶i|

• Normalized Cut: minC1,…,Ck

σ𝑖=1𝑘 𝑐𝑢𝑡(𝐶i,V\𝐶i)

vol(𝐶𝑖)

Finding the optimal solution is NP-hard

How to compute an approximate solution efficiently?

Uncertainty and Robustness of Graph Embeddings - Bojchevski 25

0

1 2

5

3 4

2

4

2

4

4

2

2

31

Minimum Cut for 𝑘 = 3

Page 26: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Graph Laplacian

Laplacian matrix 𝐿 = 𝐷 − 𝐴• 𝐴 = (weighted) adjacency matrix, 𝐷 = degree matrix

Observation: For any vector 𝑓 we have

𝑓𝑇 ⋅ 𝐿 ⋅ 𝑓 =1

2⋅ σ 𝑢,𝑣 ∈ 𝐸 𝑊𝑢𝑣 𝑓𝑢 − 𝑓𝑣

2

Normalized Laplacian 𝐿𝑠𝑦𝑚 = 𝐷−1

2𝐿𝐷−1

2 = 𝐼 − 𝐷−1

2𝐴𝐷−1

2

26Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 27: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Physical interpretation of the Laplacian (I)

Let f be a heat distribution over a graph with 𝑓𝑖 = the heat at node 𝑣𝑖

The heat transferred between 𝑣𝑖 and 𝑣𝑗 is prop. to (𝑓𝑖−𝑓𝑗) if 𝑖, 𝑗 ∈ 𝐸

27

https://en.wikipedia.org/wiki/Laplacian_matrix#/media/File:Graph_Laplacian_Diffusion_Example.gif

Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 28: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Physical interpretation of the Laplacian (I)

Graph is viewed as an electrical circuit with edges as wires (resistors)

Apply voltage at some nodes and measure induced voltage at other nodes

Induced voltages minimizes σ 𝑢,𝑣 ∈ 𝐸 𝑥𝑢 − 𝑥𝑣2

We can find the voltage by minimizing 𝑥𝑇𝐿x

28Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 29: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Properties of the Graph Laplacian

L is symmetric and positive semi-definite

The number of eigenvectors of 𝐿 with eigenvalue 0 corresponds to the number of connected components

Algebraic connectivity of a graph is 𝜆2(𝐿)• The magnitude reflects how well connected the graph overall is

The spectrum of 𝐿 encodes useful information about the graph• Unfortunately, there exist co-spectral graphs

29Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 30: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Minimum cut and the graph Laplacian

Define indicator vector: : ℎ𝐶𝑘 𝑖 = ቐ1

|C𝑖|𝑖𝑓 𝑣𝑖 ∈ 𝐶𝑘

0 𝑒𝑙𝑠𝑒

Let H = [ℎ𝐶1; ℎ𝐶2; … ; ℎ𝐶𝑘]

Observations:

𝐻𝑇𝐻 = 𝐼𝑑 is orthonormal

ℎ𝐶𝑖𝑇 ⋅ 𝐿 ⋅ ℎ𝑐𝑖 =

𝑐𝑢𝑡 𝐶𝑖,𝑉\𝐶𝑖

𝐶𝑖and ℎ𝐶𝑖

𝑇 ⋅ 𝐿 ⋅ ℎ𝑐𝑖 = (𝐻𝑇𝐿𝐻)𝑖𝑖

𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡(𝐶1, … , 𝐶𝑘) = σ𝑖=1𝑘 𝑐𝑢𝑡 𝐶𝑖,𝑉\𝐶𝑖

𝐶𝑖= σ𝑖=1

𝑘 (𝐻𝑇𝐿𝐻)𝑖𝑖 = 𝑡𝑟𝑎𝑐𝑒(𝐻𝑇𝐿𝐻)

NetGAN: Generating Graphs via Random Walks - Bojchevski, Shchur, Zügner, Günnemann. 30

0

1 2

5

3 4

2

4

2

4

4

2

2

31

Page 31: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Minimum cut and the graph Laplacian

Minimizing ratio-cut (normalized cut with 𝐿𝑠𝑦𝑚) is equivalent to

min𝐶1,…,𝐶𝑘

𝑡𝑟𝑎𝑐𝑒(𝐻𝑇𝐿𝐻) subject to 𝐻𝑇𝐻 = 𝐼𝑑

Constraint relaxation: allow arbitrary values for H

min𝐻∈𝑅𝑉×𝐾

𝑡𝑟𝑎𝑐𝑒(𝐻𝑇𝐿𝐻) subject to 𝐻𝑇𝐻 = 𝐼𝑑

Standard trace minimization problem

Optimal 𝐻 = First 𝐾 smallest eigenvectors of 𝐿

31Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 32: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Spectral embedding: random walk view

𝐿𝑟𝑤 = 𝐷−1𝐿 = 𝐼 − 𝐷−1𝐴 = 𝐼 − 𝑃 is the the random walk Laplacian• 𝜆 is an eigenvalue of 𝐿𝑟𝑤 with eigenvector 𝑢 if and only if 𝜆 is an eigenvalue of 𝐿𝑠𝑦𝑚 with

eigenvector 𝑤 = 𝐷1/2𝑢

Let 𝑃 𝐵 𝐴 = 𝑃 𝑋1 ∈ 𝐵 𝑋0 ∈ 𝐴 be the probability of a random walker currently at any node in 𝐴 to transition to any node in 𝐵, for 𝐴 ∩ 𝐵 = ∅ and A, 𝐵 ⊂ 𝑉. Sample 𝑋0 ∼ 𝜋 from the stationary distribution 𝑁𝑐𝑢𝑡(𝐴, ҧ𝐴) = 𝑃( ҧ𝐴|𝐴) + 𝑃(𝐴| ҧ𝐴)

Uncertainty and Robustness of Graph Embeddings - Bojchevski 32

Page 33: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Spectral Embedding

Finding the spectral embedding = Solving an optimization task

𝐻∗ = k-first eigenvectors of 𝐿(𝐴)

33

𝐻∗ = arg min𝐻∈ℝ𝑛×𝑑

𝑇𝑟𝑎𝑐𝑒 𝐻𝑇 ⋅ 𝐿(𝐴) ⋅ 𝐻

subject to 𝐻𝑇 ⋅ 𝐷𝐴 ⋅ 𝐻 = 𝐼𝑑

𝐿(𝐴) = 𝐷(𝐴) − 𝐴

InputGraph 𝐴

Graph Laplacian𝐿(𝐴)

Trace minimizationRatio/Normalized Cut

OutputEmbedding 𝐻∗

Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 34: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Problem: sensitive to noisy data

Noisy

34

⇒ ⇒

⇒ ⇒

Spuriousedges

Distortedembedding

Wrongclustering

Clean

⇒ ⇒⇒Noisy

Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 35: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Robustness via Latent Decomposition

Jointly learn decomposition & embedding

Decomposition steered by the underlying embedding / clustering

35

𝐴 = 𝐴𝑔 + 𝐴𝑐

𝐴𝑐𝐴𝑔

good graph sparse corruptions

+

𝐻∗ = arg min𝐻∈ℝ𝑛×𝑑

𝑇𝑟𝑎𝑐𝑒 𝐻𝑇 ⋅ 𝐿(𝐴𝑔) ⋅ 𝐻

subject to 𝐻𝑇 ⋅ 𝐷(𝐴𝑔) ⋅ 𝐻 = 𝐼𝑑

𝐴∗, 𝐻∗ = arg min𝐻∈ℝ𝑛×𝑑

𝐴𝑔∈ ℝ≥0 𝑛×𝑛

𝑇𝑟𝑎𝑐𝑒 𝐻𝑇 ⋅ 𝐿(𝐴𝑔) ⋅ 𝐻

subject to 𝐻𝑇 ⋅ 𝐷(𝐴𝑔) ⋅ 𝐻 = 𝐼𝑑𝐴 = 𝐴𝑔 + 𝐴𝑐

𝐴𝑐 0 ≤ 2𝜃∀𝑖: 𝑎𝑖

𝑐0 ≤ 𝜔𝑖

global

local

Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 36: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Solution: Alternating optimization

Update 𝐻, Given 𝐴𝑔/𝐴𝑐 → 𝐸𝑎𝑠𝑦• Trace minimization problem

• Solution for 𝐻 are the 𝑘 first generalized eigenvectors of 𝐿(𝐴𝑔)

36

update 𝐴𝑔

update 𝐻

𝐴∗, 𝐻∗ = arg min𝐻∈ℝ𝑛×𝑑

𝐴𝑔∈ ℝ≥0 𝑛×𝑛

𝑇𝑟𝑎𝑐𝑒 𝐻𝑇 ⋅ 𝐿(𝐴𝑔) ⋅ 𝐻

Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 37: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Solution: Alternating optimization

Update 𝐴𝑔/𝐴𝑐 , Given 𝐻 → 𝑁𝑃 𝐻𝑎𝑟𝑑• Express eigenvalues of 𝐴𝑛𝑒𝑤

𝑔in closed form

• 𝐴𝑛𝑒𝑤𝑔

that minimizes the trace equivalent to maximizing 𝑓

Robust Spectral Clustering 37Aleksandar Bojchevski

𝑓 𝑎𝑢𝑣𝑐

𝑢,𝑣∈𝐸 =

𝑢,𝑣∈𝐸

𝑎𝑢𝑣𝑐 𝒉𝑢 − 𝒉𝑣 2

2

𝑛𝑜𝑑𝑒𝑠 𝑓𝑎𝑟 𝑎𝑤𝑎𝑦 𝑖𝑛𝑡ℎ𝑒 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑠𝑝𝑎𝑐𝑒

− 𝜆 ∘ 𝒉𝑢 2− 𝜆 ∘ 𝒉𝑣 2

𝑝𝑟𝑒𝑓𝑒𝑟𝑠 𝑒𝑑𝑔𝑒𝑠 𝑐𝑙𝑜𝑠𝑒𝑡𝑜 𝑡ℎ𝑒 𝑜𝑟𝑖𝑔𝑖𝑛

subject to ⋅ o constraints

𝐴∗, 𝐻∗ = arg min𝐻∈ℝ𝑛×𝑑

𝐴𝑔∈ ℝ≥0 𝑛×𝑛

𝑇𝑟𝑎𝑐𝑒 𝐻𝑇 ⋅ 𝐿(𝐴𝑔) ⋅ 𝐻

Page 38: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Solution: Alternating optimization

Equivalent to Multidimensional Knapsack problem• Greedy approximation

• Best possible approximation ratio of 1

𝑁+1

Efficient solution in O(#edges)

38Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 39: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

I

39

Page 40: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Conclusion

Spectral embedding is sensitive to noisy data

Robustness via latent decompositionณ𝐴

𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙𝑔𝑟𝑎𝑝ℎ

= ด𝐴𝑔

𝑔𝑜𝑜𝑑𝑔𝑟𝑎𝑝ℎ

+ ด𝐴𝑐

𝑠𝑝𝑎𝑟𝑠𝑒𝑐𝑜𝑟𝑟𝑢𝑝𝑡𝑖𝑜𝑛𝑠

Removed corrupted edges ⇒ increased discrimination

40Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 41: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Neglected aspects of graph embeddings

Capturing uncertainty

Robustness to noise

Robustness to adversarial attacks

Uncertainty and Robustness of Graph Embeddings - Bojchevski 41

Page 42: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Adversarial attacks on graph embeddings

(Spectral) Embeddings are not robust to noise / but we can remedy that

Are graph embeddings robust to adversarial attacks?

In domains where graph embeddings are used (e.g. the Web)

adversaries are common and false data is easy to inject

Uncertainty and Robustness of Graph Embeddings - Bojchevski 42

Page 43: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Adversarial attacks in the image domain

Image of a tabby cat correctly classified

• Add imperceptible perturbation

• Model classifies the cat as guacamole

43

Training data

Training

Model

88% tabby cat

Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 44: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Perturbation

Adversarial attacks in the image domain

Image of a tabby cat correctly classified

Add imperceptible perturbation

Model classifies the cat as guacamole

44

Training data

Training

Model

99% guacamole

Perturbed image

Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 45: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

The relational nature of the data might

Improve Robustness

embeddings are computed jointly

rather than in isolation

Cause Cascading Failures

perturbations in one part of the graph can propagate to the rest

45Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 46: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Attack possibilities

General attack

Goal: decrease the overall quality of the embeddings

Actions:• Add/remove (flip) an edge

• Add/remove a node

• …

Targeted attack

Goal: attack a specific node or a specific downstream task

Examples:• Misclassify a target node 𝑡

• Increase/decrease the similarity of a set of node pairs 𝒯 ⊂ 𝑉 × 𝑉

46Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 47: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Attack model formally

መ𝐴∗ = arg max𝐴∈ 0,1 𝑁×𝑁

ℒ( መ𝐴, 𝑍∗)

𝑍∗ = min𝑍

ℒ( መ𝐴, 𝑍) 𝑠𝑢𝑏𝑗. 𝑡𝑜 መ𝐴 − 𝐴0= 2𝑓

Uncertainty and Robustness of Graph Embeddings - Bojchevski 47

Adjacency matrix of the graph after the attacker modified some entries

Optimal embedding from theto be optimized graph መ𝐴

The attacker’s budget

Page 48: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Attack model formally

መ𝐴∗ = arg max𝐴∈ 0,1 𝑁×𝑁

ℒ( መ𝐴, 𝑍∗)

𝑍∗ = min𝑍

ℒ( መ𝐴, 𝑍) 𝑠𝑢𝑏𝑗. 𝑡𝑜 መ𝐴 − 𝐴0= 2𝑓

Uncertainty and Robustness of Graph Embeddings - Bojchevski 48

Adjacency matrix of the graph after the attacker modified some entries

Optimal embedding from theto be optimized graph መ𝐴

The attacker’s budget

General attack

Page 49: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Attack model formally

መ𝐴∗ = arg max𝐴∈ 0,1 𝑁×𝑁

ℒ𝑎𝑡𝑐𝑘( መ𝐴, 𝑍∗)

𝑍∗ = min𝑍

ℒ( መ𝐴, 𝑍) 𝑠𝑢𝑏𝑗. 𝑡𝑜 መ𝐴 − 𝐴0= 2𝑓

Uncertainty and Robustness of Graph Embeddings - Bojchevski 49

Adjacency matrix of the graph after the attacker modified some entries

Optimal embedding from theto be optimized graph መ𝐴

The attacker’s budget

Targeted attack

Page 50: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Challenges

Discrete and Combinatorial Bi-level optimization problem

Transductive learning ⇒ network poisoning setting

Uncertainty and Robustness of Graph Embeddings - Bojchevski 50

frozen model

target

train

Evasion

train train

target

modelembedding

Poisoning

Page 51: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Random-walk based embeddings

RW-based embeddings solve:

𝑍∗ = min𝑍

ℒ( 𝑟1, 𝑟2, … , 𝑍) with 𝑟𝑖 = 𝑅𝑊𝑙(𝐴)

𝑍∗ ∈ ℝ𝑁×𝐾: learned embedding

𝑅𝑊𝑙: e stochastic procedure that generates RWs of length 𝑙

ℒ: model-specific loss e.g. skip-gram with negative sampling (SGSN)

Challenge: RW sampling precludes gradient based optimization

Uncertainty and Robustness of Graph Embeddings - Bojchevski 51

Page 52: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Example: DeepWalk

DeepWalk is equivalent* to factorizing

෩𝑀 = logmax 𝑀, 1

𝑀 =𝑣𝑜𝑙 𝐴

𝑇⋅𝑏𝑆 𝑆 = σ𝑟=1

𝑇 𝑃𝑟 𝐷−1 𝑃 = 𝐷−1𝐴

with 𝑍∗ obtained by the SVD of ෩𝑀 = 𝑈Σ𝑉𝑇 using the top K largest singular

values/vectors i.e. 𝑍∗ = 𝑈𝐾Σ𝐾1/2

Uncertainty and Robustness of Graph Embeddings - Bojchevski 52

transition matrixwindow size 𝑇𝑏 negative samples

Shifted Positive PointwiseMutual Information (PPMI) Matrix

Page 53: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Example: DeepWalk

Equivalent to min෩𝑀𝐾

|| ෩𝑀 − ෩𝑀𝐾||𝐹2

The loss using the optimal embedding is ℒ𝐷𝑊1𝐴, 𝑍∗ = σ

𝑝=𝐾+1|𝑉|

𝜎𝑝2, where

𝜎1 ≥ 𝜎2 ≥ ⋯ ≥ 𝜎|𝑉| are the singular values of ෩𝑀(𝐴) ordered decreasingly

Idea: Given a perturbation Δ𝐴, find the change in the singular values of ෩𝑀(𝐴 + Δ𝐴)

Uncertainty and Robustness of Graph Embeddings - Bojchevski 53

Page 54: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Example: DeepWalk

෩𝑀 = logmax 𝑀, 1 𝑀 =𝑣𝑜𝑙 𝐴

𝑇⋅𝑏𝑆 𝑆 = σ𝑟=1

𝑇 𝑃𝑟 𝐷−1

Linearization: ignore the log(⋅) and max(⋅, 1)

Scalars 𝑣𝑜𝑙 𝐴 , 𝑇, 𝑏 can be also ignore

Rewrite ℒ𝐷𝑊1𝐴, 𝑍∗ = σ

𝑝=𝐾+1|𝑉|

|𝜆𝑝|2

Thus, find a change in the spectrum of 𝑆 after the attacker perturbed the graph Δ𝐴

Uncertainty and Robustness of Graph Embeddings - Bojchevski 54

Page 55: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Spectrum of S

Compute the generalized spectrum (generalized eigenvalues/vectors) of 𝐴

i.e. compute and 𝑈, Λ that solve 𝐴𝑢 = 𝜆𝐷𝑢

Rewrite 𝑆 = σ𝑟=1𝑇 𝑃𝑟 𝐷−1 as 𝑆 = 𝑈 (σ𝑟=1

𝑇 Λ𝑟)𝑈𝑇

The task is now to find the change in generalized eigenvalues 𝜆𝑝 of the adjacency matrix 𝐴 given a perturbation Δ𝐴

Uncertainty and Robustness of Graph Embeddings - Bojchevski 55

simple function of the generalized eigenvalues 𝜆𝑖 of the graph

Page 56: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Eigenvalue perturbation theory

Given 𝑈, Λ that solve 𝐴𝑢 = 𝜆𝐷𝑢′ and a small perturbation Δ𝐴, Δ𝐷

Find 𝑈′, Λ′ that solve 𝐴 + Δ𝐴 𝑢′ = 𝜆′(𝐷 + Δ𝐷)𝑢′

First order approximation:𝜆′𝑝 = 𝜆𝑝 + 𝑢𝑝

𝑇 Δ𝐴 + 𝜆𝑝Δ𝐷 𝑢𝑝

for small Δ𝐴 and ΔD higher order terms become negligible

Uncertainty and Robustness of Graph Embeddings - Bojchevski 56

Page 57: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

For a single edge flip

Δ𝐴 is a matrix with only 2 non-zero elements for a single edge flip (𝑖, 𝑗)

namely Δ𝐴𝑖𝑗 = Δ𝐴𝑗𝑖 = 1 − 2A𝑖𝑗 ≔ Δw𝑖𝑗

Similarly, ΔD has only two non-zero elements on the diagonal

Then we can approximate the generalized eigenvalues of A + Δ𝐴 in closed-form

computable in O(1) time:

𝜆′𝑝 = 𝜆𝑝 + Δw𝑖𝑗 2𝑢𝑝𝑖 ⋅ 𝑢𝑝𝑗 − 𝜆𝑝(𝑢𝑝𝑖2 + 𝑢𝑝𝑗

2 )

Uncertainty and Robustness of Graph Embeddings - Bojchevski 57

Page 58: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Connecting it all together

1. DeepWalk is equivalent to a SVD of ෩𝑀 = logmax𝑣𝑜𝑙 𝐴

𝑇⋅𝑏𝑆, 1

2. The loss can be computed from the singular values / the spectrum of S

3. The spectrum of 𝑆 can be easily computed from the generalized spectrum of A

4. For any given edge flip (𝑖, 𝑗) we can compute in O(1) the spectrum of 𝐴 + Δ𝐴

Uncertainty and Robustness of Graph Embeddings - Bojchevski 58

Page 59: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Overall algorithm

However, መ𝐴∗ = arg max

𝐴∈ 0,1 𝑁×𝑁ℒ𝑐𝑙𝑜𝑠𝑒𝑑−𝑓𝑜𝑟𝑚 𝑠𝑢𝑏𝑗. 𝑡𝑜 መ𝐴 − 𝐴

0= 2𝑓

is still hard to optimize –𝑁2

𝑓ways to choose the flips

Greedy solution:

1. For each edge (𝑖, 𝑗) calculate its impact on the loss if flipped

2. Pick the top f edges

59Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 60: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

General attack (node classification proxy)

Uncertainty and Robustness of Graph Embeddings - Bojchevski 60

Page 61: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Targeted attack

To target node 𝑣 we need the change in its embedding 𝑍𝑣∗

That is we need the change in eigenvectors

Apply eigenvalue perturbation again to approximate the top 𝐾 eigenvectors

For a given edge flip (𝑖, 𝑗) we get:

𝑢𝑝′ = 𝑢𝑝 − Δw𝑖𝑗 𝐴 − 𝜆𝐷 + −Δ𝜆𝑝𝑢𝑝 ∘ 𝑑 + 𝐸𝑖 𝑢𝑝𝑗 − 𝜆𝑝𝑢𝑝𝑖 + 𝐸𝑗 𝑢𝑝𝑖 − 𝜆𝑝𝑢𝑝𝑗

Uncertainty and Robustness of Graph Embeddings - Bojchevski 61

Page 62: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Targeted attack: Link prediction

Uncertainty and Robustness of Graph Embeddings - Bojchevski 62

Page 63: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Targeted attack: Node classification

Uncertainty and Robustness of Graph Embeddings - Bojchevski 63

Beforeattack

Degreeattack

Randomattack

Ourattack

Page 64: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Analysis of adversarial edges

Uncertainty and Robustness of Graph Embeddings - Bojchevski 64

Page 65: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Transferability

DW (SVD) DW (SGNS) n2v Spect. Embd. Label Prop. GCN

𝑓 = 250 (0.8%) -3.59 -2.37 -2.04 -2.11 -5.78 -3.34

𝑓 = 500 (1.6%) -4.62 -3.97 -3.48 -4.57 -8.95 -2.33

𝑓 = 250 (1.7%) -7.59 -5.73 -6.45 -3.58 -4.99 -2.21

𝑓 = 500 (3.4%) -9.68 -11.47 -10.24 -4.57 -6.27 -8.61

Uncertainty and Robustness of Graph Embeddings - Bojchevski 65

Page 66: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Conclusion

Node embeddings are vulnerable to adversarial attacks

Poisoning has negative effect on the embeddings quality and the downstream tasks

Attacks are transferable – they generalize to many models

66Uncertainty and Robustness of Graph Embeddings - Bojchevski

Page 67: Uncertainty and Robustness of Graph Embeddingsged2018.sci-web.net/slides/Bojchevski.pdf · •Less than 4.2%triplets seen to match performance •Lower gradient variance Uncertainty

Important aspects of graph embeddings

Capturing uncertainty

Robustness to noise

Robustness to adversarial attacks

Uncertainty and Robustness of Graph Embeddings - Bojchevski 67