Statistical Analysis of Network Data with R - Network ... · Kim Seonghyeon Statistical Analysis of...

Statistical Analysis of Network Data with RNetwork Cohesion & Graph Partitioning

Kim Seonghyeon

April 14, 2017

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 1 / 27

Network Cohesion

subgraph & censuses

cliqueclique: Complete subgraphmaximal clique: A clique that is not a subset of a larger clique

subgraph & censuses

Figure 1: karateKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 4 / 27

subgraph & censuses

Table 1: number of clique

1 2 3 4 5count 34 78 45 11 2

Table 2: number of maximal clique

2 3 4 5count 11 21 2 2

subgraph & censuses

core & corenessk-core: weakened notion of cliqueA subgraph of G for which all vertex degrees are at least k.No other subgraph obeying the same condition contains it. (i.e., it ismaximal in this property)coreness: coreness(v) = max{k|H is k-core, v ∈ VH}

subgraph & censuses

Figure 2: visualization with corenessKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 7 / 27

subgraph & censuses

Censuses (directed graph)mutual:Cmut = {{u, v} ⊂ VG |(u, v), (v , u) ∈ EG}asymmetric:Casym = {{u, v} ⊂ VG |(u, v) ∈ EG} \ Cmutnull:Cnull = {{u, v}|{u, v} ⊂ VG} \ (Cmut ∪ Casym)

subgraph & censuses## aidsblog

## $v## [1] 146#### $e## [1] 183#### $mut## [1] 3#### $asym## [1] 177#### $null## [1] 10405

Density and Related Notions of Relative Frequency

DensityDensity:

den(H) = |EH ||VH |(|VH | − 1)/2

*In the case that G is a directed graph, the denominator is replaced by|VH|(|VH| − 1).

clustering coefficientglobal clustering coefficient:

clT (G) = 3τ∆(G)τ3(G)

local clustering coefficient:

cl(v) = τ∆(v)τ3(v)

Figure 3: transitivityKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 12 / 27

reciprocity (directed graph)type 1:

rec1(G) = |Cmut ||Cmut ∪ Casym|

type 2:

rec2(G) = 2|Cmut ||EG |

Figure 4: reciprocityKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 14 / 27

Connectivity, Cuts, and Flows

ConnectivityA graph G is said to be connected if every vertex is reachable fromevery other.Connected component of a graph is a maximally connected subgraph.diameter: length of the longest path.

Connectivity, Cuts, and Flows

k-vertex-connectedA graph G is called k-vertex-connected if(i) the number of vertices Nv > k(ii) the removal of any subset of vertices X ⊂ V of cardinality |X | < kleaves a subgraph that is connected.connectivity: connectivity(G) = max{k|G is k-vertex-connected }

Graph Partitioning

Graph Partition

Graph Partitionpartition: C = {C1, ...,CK}, partition of the vertex set VGE (Ck ,Ck′): edges connecting vertices in Ck to vertices in Ck

We want to seek partition C where E (Ck ,Ck′) is relatively small in sizecompared to the set E (Ck ,Ck)

Hierarchical Clustering

modularity

eij = |E (Ci ,Cj)|2|E | , ai =

K∑j=1

mod(C) =K∑

i=1(eii − ai

mod(C) = 0 if eij = aiajmod(C) is large if ∑K

i=1 eii = 1

## fraction1

## 1 2 3 sum## 1 0.04 0.04 0.12 0.2## 2 0.04 0.04 0.12 0.2## 3 0.12 0.12 0.36 0.6## sum 0.20 0.20 0.60 1.0

## fraction2

## 1 2 3 sum## 1 0.2 0.0 0.0 0.2## 2 0.0 0.2 0.0 0.2## 3 0.0 0.0 0.6 0.6## sum 0.2 0.2 0.6 1.0

Hierarchical methodsagglomerative: begin with partition {{v1}, ..., {vNv}}divisive: begin with partition {V }

151617

Actor 2

Actor 3Actor 4

Actor 5

Actor 6Actor 7

Actor 8

Actor 9

Actor 10

Actor 11

Actor 12

Actor 13

Actor 14

Actor 15

Actor 16

Actor 17Actor 18

Actor 19

Actor 20

Actor 21

Actor 22

Actor 23

Actor 24

Actor 25Actor 26

Actor 27

Actor 28

Actor 29

Actor 30

Actor 31

Actor 32

Actor 33John A

Figure 5: Agglomerative ClusteringKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 22 / 27

Spectral Partitioning

graph Laplaciangraph Laplacian: L = D − A, where A is adjacency matrix andD = diag [(dv )]λ1 ≤ ... ≤ λNv are the eigenvalues of L.graph G will consist of K connected components if and only ifλ1(L) = ··· = λK (L) = 0 and 0 < λK+1.

Figure 6: Agglomerative ClusteringKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 24 / 27

## [1] 0 0 0 1 2 2 2 2 3 3 4 5

## A B C D E F G H I J K L## 1 0.00 0.00 0.00 0.00 0.00 0.5 0.5 0.5 0.5 0.00 0.00 0.00## 2 0.00 0.00 0.00 0.00 0.00 0.0 0.0 0.0 0.0 0.58 0.58 0.58## 3 0.45 0.45 0.45 0.45 0.45 0.0 0.0 0.0 0.0 0.00 0.00 0.00

spectral bisectionIf λ2(L) is close to zero, we might expect that there is good candidatefor bisection.partition vertices by separating them according to the sign of theirentries in the corresponding eigenvector x2S = {v ∈ V : x2(v) ≥ 0}, S̄ = {v ∈ V : x2(v) < 0}

0 5 10 15 20 25 30 35

Actor Number

Figure 7: Agglomerative Clustering

Statistical Analysis of Network Data with R - Network ... · Kim Seonghyeon Statistical Analysis of...

Documents

Statistical Analysis of Network Data

Theoretical foundations for statistical network analysis ... · PDF fileTheoretical foundations for statistical network analysis Brief background / historical information • Why was

Network Statistical Analysis

Practical statistical network analysis (with R and igraph)

Theoretical foundations for statistical network analysis ... · repercussions of these representations. 2) Statistical models: new structurally -rich and tractable mathematical models

Studying coincidences with network analysis and other ...Studying coincidences with network analysis and other statistical tools M. Escobar (modesto@usal.es) ... Turina Garzon Joaquin

Statistical Justiﬁcations for Computationally Tractable ...pages.stat.wisc.edu/~wahba/ftp1/tr1179.pdf · Statistical Justiﬁcations for Computationally Tractable Network Data Analysis

Practical statistical network analysis (with R and igraphstevel/504/igraph.pdf · Practical statistical network analysis (with R and igraph) G´abor Cs´ardi csardi@rmki.kfki.hu Department

Statistical analysis of network data and evolution on GPUs ... · Statistical analysis of network data and evolution on GPUs: High-performance statistical computing Thomas Thorne

A Statistical Analysis of Network Data from Reddit · A Statistical Analysis of Network Data from Reddit Abstract Network structures are everywhere, from social networks to health

Statistical Analysis of Longitudinal Network Data With Changing …snijders/HuismanSnijders2003.pdf · Statistical Analysis of Longitudinal Network Data With Changing Composition

CONSULTING, STATISTICAL 141 MULTIVARIATE ANALYSIS STATISTICAL INFERENCE) · 2016-03-23 · MULTIVARIATE ANALYSIS STATISTICAL INFERENCE) CONSULTING, STATISTICAL DEFINITION Statistical

Microbiome Multi-Omics Network Analysis: Statistical ...sites.science.oregonstate.edu/~jiangd/papers/Front Genet...Jiang et al. Microbiome Multi-Omics Network Analysis 3 Networks adopt

Statistical Analysis of Network Data€¦ · Statistical Analysis of Network Data with R is book is the rst of its kind in network research. It can be used as a stand-alone resource

STATISTICAL ANALYSIS FOR ORIGIN-DESTINATION MATRICES OF TRANSPORT NETWORK

STATISTICAL ANALYSIS OF THE INDIAN RAILWAY NETWORK: A

Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,

STATISTICAL AND NETWORK ANALYSIS

Statistical analysis of network data and evolution on GPUs: High-performance statistical computing

Personal network analysis using EgoNet - InterSciWikiintersci.ss.uci.edu/wiki/pub/Personal.pdfPersonal Network Analysis Using Egonet ... Kavita_G 2 3 0 22 1 3 1991 ... statistical