Statistical Analysis of Network Data with R - Network ... · Kim Seonghyeon Statistical Analysis of...

Preview:

Citation preview

Statistical Analysis of Network Data with RNetwork Cohesion & Graph Partitioning

Kim Seonghyeon

April 14, 2017

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 1 / 27

Network Cohesion

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 2 / 27

subgraph & censuses

cliqueclique: Complete subgraphmaximal clique: A clique that is not a subset of a larger clique

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 3 / 27

subgraph & censuses

H

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

2526

27

28

2930

31

32

33A

Figure 1: karateKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 4 / 27

subgraph & censuses

Table 1: number of clique

1 2 3 4 5count 34 78 45 11 2

Table 2: number of maximal clique

2 3 4 5count 11 21 2 2

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 5 / 27

subgraph & censuses

core & corenessk-core: weakened notion of cliqueA subgraph of G for which all vertex degrees are at least k.No other subgraph obeying the same condition contains it. (i.e., it ismaximal in this property)coreness: coreness(v) = max{k|H is k-core, v ∈ VH}

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 6 / 27

subgraph & censuses

Figure 2: visualization with corenessKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 7 / 27

subgraph & censuses

Censuses (directed graph)mutual:Cmut = {{u, v} ⊂ VG |(u, v), (v , u) ∈ EG}asymmetric:Casym = {{u, v} ⊂ VG |(u, v) ∈ EG} \ Cmutnull:Cnull = {{u, v}|{u, v} ⊂ VG} \ (Cmut ∪ Casym)

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 8 / 27

subgraph & censuses## aidsblog

## $v## [1] 146#### $e## [1] 183#### $mut## [1] 3#### $asym## [1] 177#### $null## [1] 10405

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 9 / 27

Density and Related Notions of Relative Frequency

DensityDensity:

den(H) = |EH ||VH |(|VH | − 1)/2

*In the case that G is a directed graph, the denominator is replaced by|VH|(|VH| − 1).

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 10 / 27

Density and Related Notions of Relative Frequency

clustering coefficientglobal clustering coefficient:

clT (G) = 3τ∆(G)τ3(G)

local clustering coefficient:

cl(v) = τ∆(v)τ3(v)

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 11 / 27

Density and Related Notions of Relative Frequency

a

b

c

d

Figure 3: transitivityKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 12 / 27

Density and Related Notions of Relative Frequency

reciprocity (directed graph)type 1:

rec1(G) = |Cmut ||Cmut ∪ Casym|

type 2:

rec2(G) = 2|Cmut ||EG |

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 13 / 27

Density and Related Notions of Relative Frequency

a

b

c

Figure 4: reciprocityKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 14 / 27

Connectivity, Cuts, and Flows

ConnectivityA graph G is said to be connected if every vertex is reachable fromevery other.Connected component of a graph is a maximally connected subgraph.diameter: length of the longest path.

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 15 / 27

Connectivity, Cuts, and Flows

k-vertex-connectedA graph G is called k-vertex-connected if(i) the number of vertices Nv > k(ii) the removal of any subset of vertices X ⊂ V of cardinality |X | < kleaves a subgraph that is connected.connectivity: connectivity(G) = max{k|G is k-vertex-connected }

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 16 / 27

Graph Partitioning

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 17 / 27

Graph Partition

Graph Partitionpartition: C = {C1, ...,CK}, partition of the vertex set VGE (Ck ,Ck′): edges connecting vertices in Ck to vertices in Ck

We want to seek partition C where E (Ck ,Ck′) is relatively small in sizecompared to the set E (Ck ,Ck)

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 18 / 27

Hierarchical Clustering

modularity

eij = |E (Ci ,Cj)|2|E | , ai =

K∑j=1

eij

mod(C) =K∑

i=1(eii − ai

2)

mod(C) = 0 if eij = aiajmod(C) is large if ∑K

i=1 eii = 1

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 19 / 27

Hierarchical Clustering

## fraction1

## 1 2 3 sum## 1 0.04 0.04 0.12 0.2## 2 0.04 0.04 0.12 0.2## 3 0.12 0.12 0.36 0.6## sum 0.20 0.20 0.60 1.0

## fraction2

## 1 2 3 sum## 1 0.2 0.0 0.0 0.2## 2 0.0 0.2 0.0 0.2## 3 0.0 0.0 0.6 0.6## sum 0.2 0.2 0.6 1.0

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 20 / 27

Hierarchical Clustering

Hierarchical methodsagglomerative: begin with partition {{v1}, ..., {vNv}}divisive: begin with partition {V }

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 21 / 27

Hierarchical Clustering

H2

34

5

6

7

8

9

10

11

12

13

14

151617

18

19

2021

22

23

24

2526

27

28

2930

31

32

33A

Mr Hi

Actor 2

Actor 3Actor 4

Actor 5

Actor 6Actor 7

Actor 8

Actor 9

Actor 10

Actor 11

Actor 12

Actor 13

Actor 14

Actor 15

Actor 16

Actor 17Actor 18

Actor 19

Actor 20

Actor 21

Actor 22

Actor 23

Actor 24

Actor 25Actor 26

Actor 27

Actor 28

Actor 29

Actor 30

Actor 31

Actor 32

Actor 33John A

Figure 5: Agglomerative ClusteringKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 22 / 27

Spectral Partitioning

graph Laplaciangraph Laplacian: L = D − A, where A is adjacency matrix andD = diag [(dv )]λ1 ≤ ... ≤ λNv are the eigenvalues of L.graph G will consist of K connected components if and only ifλ1(L) = ··· = λK (L) = 0 and 0 < λK+1.

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 23 / 27

Spectral Partitioning

A

B

C

D

E

F

G

H

I

J

K

L

Figure 6: Agglomerative ClusteringKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 24 / 27

Spectral Partitioning

## [1] 0 0 0 1 2 2 2 2 3 3 4 5

## A B C D E F G H I J K L## 1 0.00 0.00 0.00 0.00 0.00 0.5 0.5 0.5 0.5 0.00 0.00 0.00## 2 0.00 0.00 0.00 0.00 0.00 0.0 0.0 0.0 0.0 0.58 0.58 0.58## 3 0.45 0.45 0.45 0.45 0.45 0.0 0.0 0.0 0.0 0.00 0.00 0.00

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 25 / 27

Spectral Partitioning

spectral bisectionIf λ2(L) is close to zero, we might expect that there is good candidatefor bisection.partition vertices by separating them according to the sign of theirentries in the corresponding eigenvector x2S = {v ∈ V : x2(v) ≥ 0}, S̄ = {v ∈ V : x2(v) < 0}

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 26 / 27

Spectral Partitioning

0 5 10 15 20 25 30 35

010

2030

4050

Index

Eig

enva

lues

of G

raph

Lap

laci

an

0 5 10 15 20 25 30 35

−0.

3−

0.2

−0.

10.

00.

10.

2

Actor Number

Fie

dler

Vec

tor

Ent

ry

Figure 7: Agglomerative Clustering

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 27 / 27

Recommended