Statistical Analysis of Network Data with RNetwork Cohesion & Graph Partitioning
Kim Seonghyeon
April 14, 2017
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 1 / 27
Network Cohesion
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 2 / 27
subgraph & censuses
cliqueclique: Complete subgraphmaximal clique: A clique that is not a subset of a larger clique
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 3 / 27
subgraph & censuses
H
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
2526
27
28
2930
31
32
33A
Figure 1: karateKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 4 / 27
subgraph & censuses
Table 1: number of clique
1 2 3 4 5count 34 78 45 11 2
Table 2: number of maximal clique
2 3 4 5count 11 21 2 2
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 5 / 27
subgraph & censuses
core & corenessk-core: weakened notion of cliqueA subgraph of G for which all vertex degrees are at least k.No other subgraph obeying the same condition contains it. (i.e., it ismaximal in this property)coreness: coreness(v) = max{k|H is k-core, v ∈ VH}
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 6 / 27
subgraph & censuses
Figure 2: visualization with corenessKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 7 / 27
subgraph & censuses
Censuses (directed graph)mutual:Cmut = {{u, v} ⊂ VG |(u, v), (v , u) ∈ EG}asymmetric:Casym = {{u, v} ⊂ VG |(u, v) ∈ EG} \ Cmutnull:Cnull = {{u, v}|{u, v} ⊂ VG} \ (Cmut ∪ Casym)
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 8 / 27
subgraph & censuses## aidsblog
## $v## [1] 146#### $e## [1] 183#### $mut## [1] 3#### $asym## [1] 177#### $null## [1] 10405
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 9 / 27
Density and Related Notions of Relative Frequency
DensityDensity:
den(H) = |EH ||VH |(|VH | − 1)/2
*In the case that G is a directed graph, the denominator is replaced by|VH|(|VH| − 1).
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 10 / 27
Density and Related Notions of Relative Frequency
clustering coefficientglobal clustering coefficient:
clT (G) = 3τ∆(G)τ3(G)
local clustering coefficient:
cl(v) = τ∆(v)τ3(v)
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 11 / 27
Density and Related Notions of Relative Frequency
a
b
c
d
Figure 3: transitivityKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 12 / 27
Density and Related Notions of Relative Frequency
reciprocity (directed graph)type 1:
rec1(G) = |Cmut ||Cmut ∪ Casym|
type 2:
rec2(G) = 2|Cmut ||EG |
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 13 / 27
Density and Related Notions of Relative Frequency
a
b
c
Figure 4: reciprocityKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 14 / 27
Connectivity, Cuts, and Flows
ConnectivityA graph G is said to be connected if every vertex is reachable fromevery other.Connected component of a graph is a maximally connected subgraph.diameter: length of the longest path.
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 15 / 27
Connectivity, Cuts, and Flows
k-vertex-connectedA graph G is called k-vertex-connected if(i) the number of vertices Nv > k(ii) the removal of any subset of vertices X ⊂ V of cardinality |X | < kleaves a subgraph that is connected.connectivity: connectivity(G) = max{k|G is k-vertex-connected }
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 16 / 27
Graph Partitioning
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 17 / 27
Graph Partition
Graph Partitionpartition: C = {C1, ...,CK}, partition of the vertex set VGE (Ck ,Ck′): edges connecting vertices in Ck to vertices in Ck
′
We want to seek partition C where E (Ck ,Ck′) is relatively small in sizecompared to the set E (Ck ,Ck)
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 18 / 27
Hierarchical Clustering
modularity
eij = |E (Ci ,Cj)|2|E | , ai =
K∑j=1
eij
mod(C) =K∑
i=1(eii − ai
2)
mod(C) = 0 if eij = aiajmod(C) is large if ∑K
i=1 eii = 1
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 19 / 27
Hierarchical Clustering
## fraction1
## 1 2 3 sum## 1 0.04 0.04 0.12 0.2## 2 0.04 0.04 0.12 0.2## 3 0.12 0.12 0.36 0.6## sum 0.20 0.20 0.60 1.0
## fraction2
## 1 2 3 sum## 1 0.2 0.0 0.0 0.2## 2 0.0 0.2 0.0 0.2## 3 0.0 0.0 0.6 0.6## sum 0.2 0.2 0.6 1.0
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 20 / 27
Hierarchical Clustering
Hierarchical methodsagglomerative: begin with partition {{v1}, ..., {vNv}}divisive: begin with partition {V }
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 21 / 27
Hierarchical Clustering
H2
34
5
6
7
8
9
10
11
12
13
14
151617
18
19
2021
22
23
24
2526
27
28
2930
31
32
33A
Mr Hi
Actor 2
Actor 3Actor 4
Actor 5
Actor 6Actor 7
Actor 8
Actor 9
Actor 10
Actor 11
Actor 12
Actor 13
Actor 14
Actor 15
Actor 16
Actor 17Actor 18
Actor 19
Actor 20
Actor 21
Actor 22
Actor 23
Actor 24
Actor 25Actor 26
Actor 27
Actor 28
Actor 29
Actor 30
Actor 31
Actor 32
Actor 33John A
Figure 5: Agglomerative ClusteringKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 22 / 27
Spectral Partitioning
graph Laplaciangraph Laplacian: L = D − A, where A is adjacency matrix andD = diag [(dv )]λ1 ≤ ... ≤ λNv are the eigenvalues of L.graph G will consist of K connected components if and only ifλ1(L) = ··· = λK (L) = 0 and 0 < λK+1.
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 23 / 27
Spectral Partitioning
A
B
C
D
E
F
G
H
I
J
K
L
Figure 6: Agglomerative ClusteringKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 24 / 27
Spectral Partitioning
## [1] 0 0 0 1 2 2 2 2 3 3 4 5
## A B C D E F G H I J K L## 1 0.00 0.00 0.00 0.00 0.00 0.5 0.5 0.5 0.5 0.00 0.00 0.00## 2 0.00 0.00 0.00 0.00 0.00 0.0 0.0 0.0 0.0 0.58 0.58 0.58## 3 0.45 0.45 0.45 0.45 0.45 0.0 0.0 0.0 0.0 0.00 0.00 0.00
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 25 / 27
Spectral Partitioning
spectral bisectionIf λ2(L) is close to zero, we might expect that there is good candidatefor bisection.partition vertices by separating them according to the sign of theirentries in the corresponding eigenvector x2S = {v ∈ V : x2(v) ≥ 0}, S̄ = {v ∈ V : x2(v) < 0}
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 26 / 27
Spectral Partitioning
0 5 10 15 20 25 30 35
010
2030
4050
Index
Eig
enva
lues
of G
raph
Lap
laci
an
0 5 10 15 20 25 30 35
−0.
3−
0.2
−0.
10.
00.
10.
2
Actor Number
Fie
dler
Vec
tor
Ent
ry
Figure 7: Agglomerative Clustering
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 27 / 27