Download pdf - Statistical Analysis of Network Data with R - Network ... · Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 22 / 27. SpectralPartitioning graphLaplacian

Statistical Analysis of Network Data with RNetwork Cohesion & Graph Partitioning

Kim Seonghyeon

April 14, 2017

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 1 / 27

Network Cohesion


subgraph & censuses

cliqueclique: Complete subgraphmaximal clique: A clique that is not a subset of a larger clique


subgraph & censuses

H

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

2526

27

28

2930

31

32

33A

Figure 1: karateKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 4 / 27

subgraph & censuses

Table 1: number of clique

1 2 3 4 5count 34 78 45 11 2

Table 2: number of maximal clique

2 3 4 5count 11 21 2 2


subgraph & censuses

core & corenessk-core: weakened notion of cliqueA subgraph of G for which all vertex degrees are at least k.No other subgraph obeying the same condition contains it. (i.e., it ismaximal in this property)coreness: coreness(v) = max{k|H is k-core, v ∈ VH}


subgraph & censuses

Figure 2: visualization with corenessKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 7 / 27

subgraph & censuses

Censuses (directed graph)mutual:Cmut = {{u, v} ⊂ VG |(u, v), (v , u) ∈ EG}asymmetric:Casym = {{u, v} ⊂ VG |(u, v) ∈ EG} \ Cmutnull:Cnull = {{u, v}|{u, v} ⊂ VG} \ (Cmut ∪ Casym)


subgraph & censuses## aidsblog

## $v## [1] 146#### $e## [1] 183#### $mut## [1] 3#### $asym## [1] 177#### $null## [1] 10405


Density and Related Notions of Relative Frequency

DensityDensity:

den(H) = |EH ||VH |(|VH | − 1)/2

*In the case that G is a directed graph, the denominator is replaced by|VH|(|VH| − 1).



clustering coefficientglobal clustering coefficient:

clT (G) = 3τ∆(G)τ3(G)

local clustering coefficient:

cl(v) = τ∆(v)τ3(v)



a

b

c

d

Figure 3: transitivityKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 12 / 27


reciprocity (directed graph)type 1:

rec1(G) = |Cmut ||Cmut ∪ Casym|

type 2:

rec2(G) = 2|Cmut ||EG |



a

b

c

Figure 4: reciprocityKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 14 / 27

Connectivity, Cuts, and Flows

ConnectivityA graph G is said to be connected if every vertex is reachable fromevery other.Connected component of a graph is a maximally connected subgraph.diameter: length of the longest path.


Connectivity, Cuts, and Flows

k-vertex-connectedA graph G is called k-vertex-connected if(i) the number of vertices Nv > k(ii) the removal of any subset of vertices X ⊂ V of cardinality |X | < kleaves a subgraph that is connected.connectivity: connectivity(G) = max{k|G is k-vertex-connected }


Graph Partitioning


Graph Partition

Graph Partitionpartition: C = {C1, ...,CK}, partition of the vertex set VGE (Ck ,Ck′): edges connecting vertices in Ck to vertices in Ck

′

We want to seek partition C where E (Ck ,Ck′) is relatively small in sizecompared to the set E (Ck ,Ck)


Hierarchical Clustering

modularity

eij = |E (Ci ,Cj)|2|E | , ai =

K∑j=1

eij

mod(C) =K∑

i=1(eii − ai

2)

mod(C) = 0 if eij = aiajmod(C) is large if ∑K

i=1 eii = 1



## fraction1

## 1 2 3 sum## 1 0.04 0.04 0.12 0.2## 2 0.04 0.04 0.12 0.2## 3 0.12 0.12 0.36 0.6## sum 0.20 0.20 0.60 1.0

## fraction2

## 1 2 3 sum## 1 0.2 0.0 0.0 0.2## 2 0.0 0.2 0.0 0.2## 3 0.0 0.0 0.6 0.6## sum 0.2 0.2 0.6 1.0



Hierarchical methodsagglomerative: begin with partition {{v1}, ..., {vNv}}divisive: begin with partition {V }



H2

34

5

6

7

8

9

10

11

12

13

14

151617

18

19

2021

22

23

24

2526

27

28

2930

31

32

33A

Mr Hi

Actor 2

Actor 3Actor 4

Actor 5

Actor 6Actor 7

Actor 8

Actor 9

Actor 10

Actor 11

Actor 12

Actor 13

Actor 14

Actor 15

Actor 16

Actor 17Actor 18

Actor 19

Actor 20

Actor 21

Actor 22

Actor 23

Actor 24

Actor 25Actor 26

Actor 27

Actor 28

Actor 29

Actor 30

Actor 31

Actor 32

Actor 33John A

Figure 5: Agglomerative ClusteringKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 22 / 27

Spectral Partitioning

graph Laplaciangraph Laplacian: L = D − A, where A is adjacency matrix andD = diag [(dv )]λ1 ≤ ... ≤ λNv are the eigenvalues of L.graph G will consist of K connected components if and only ifλ1(L) = ··· = λK (L) = 0 and 0 < λK+1.



A

B

C

D

E

F

G

H

I

J

K

L

Figure 6: Agglomerative ClusteringKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 24 / 27


## [1] 0 0 0 1 2 2 2 2 3 3 4 5

## A B C D E F G H I J K L## 1 0.00 0.00 0.00 0.00 0.00 0.5 0.5 0.5 0.5 0.00 0.00 0.00## 2 0.00 0.00 0.00 0.00 0.00 0.0 0.0 0.0 0.0 0.58 0.58 0.58## 3 0.45 0.45 0.45 0.45 0.45 0.0 0.0 0.0 0.0 0.00 0.00 0.00



spectral bisectionIf λ2(L) is close to zero, we might expect that there is good candidatefor bisection.partition vertices by separating them according to the sign of theirentries in the corresponding eigenvector x2S = {v ∈ V : x2(v) ≥ 0}, S̄ = {v ∈ V : x2(v) < 0}



0 5 10 15 20 25 30 35

010

2030

4050

Index

Eig

enva

lues

of G

raph

Lap

laci

an

0 5 10 15 20 25 30 35

−0.

3−

0.2

−0.

10.

00.

10.

2

Actor Number

Fie

dler

Vec

tor

Ent

ry

Figure 7: Agglomerative Clustering