Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Statistical Analysis of Network Data
James Boylesupervised by George Bolt
September 4, 2020
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Network Basics
Network G = (V ,E )
Vertices V = {1, . . . ,NV }Edges {i , j} joining vertices
Figure: A graph with NV = 10vertices
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Network Basics
Network G = (V ,E )
Vertices V = {1, . . . ,NV }
Edges {i , j} joining vertices
Figure: A graph with NV = 10vertices
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Network Basics
Network G = (V ,E )
Vertices V = {1, . . . ,NV }Edges {i , j} joining vertices
Figure: A graph with NV = 10vertices
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Network Basics
Network G = (V ,E )
Adjacency matrix A ∈ RNV×NV
aij =
{1 if {i , j} ∈ E ,0 otherwise
0 0 1 1 0 0 0 0 0 00 0 0 0 1 1 1 0 1 01 0 0 0 1 0 0 1 0 01 0 0 0 1 1 1 0 0 10 1 1 1 0 0 0 1 0 00 1 0 1 0 0 1 1 1 00 1 0 1 0 1 0 0 1 10 0 1 0 1 1 0 0 0 00 1 0 0 0 1 1 0 0 00 0 0 1 0 0 1 0 0 0
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Network Characteristics
Vertex centrality, measures of how “important” a vertex is:
degree - number of edges incident to a vertex
closeness centrality - ccl(v) =1∑
u∈V d(v ,u)
betweenness centrality - proportion of shortest paths betweenpairs of vertices passing through v
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Network Characteristics
Vertex centrality, measures of how “important” a vertex is:
degree - number of edges incident to a vertex
closeness centrality - ccl(v) =1∑
u∈V d(v ,u)
betweenness centrality - proportion of shortest paths betweenpairs of vertices passing through v
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Network Characteristics
Vertex centrality, measures of how “important” a vertex is:
degree - number of edges incident to a vertex
closeness centrality - ccl(v) =1∑
u∈V d(v ,u)
betweenness centrality - proportion of shortest paths betweenpairs of vertices passing through v
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Network Characteristics
Vertex centrality, measures of how “important” a vertex is:
degree - number of edges incident to a vertex
closeness centrality - ccl(v) =1∑
u∈V d(v ,u)
betweenness centrality - proportion of shortest paths betweenpairs of vertices passing through v
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Network Characteristics
Vertex centrality, measures of how “important” a vertex is:
degree - number of edges incident to a vertex
closeness centrality - ccl(v) =1∑
u∈V d(v ,u)
betweenness centrality - proportion of shortest paths betweenpairs of vertices passing through v
James Boyle supervised by George Bolt Statistical Analysis of Network Data
.
Random Graphs
Random Graphs
Stochastic Block Model
Idea: Split the vertices into groups, and consider all vertices in agiven group stochastically equivalent.
Parameters: NV ,K ∈ N, B ∈ RK×K
Split the NV vertices into K classes (a priori or at random).Class memberships c = (c1, . . . , cNV )
P(edge between vertices i and j) = bcicj
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Stochastic Block Model
Idea: Split the vertices into groups, and consider all vertices in agiven group stochastically equivalent.
Parameters: NV ,K ∈ N, B ∈ RK×K
Split the NV vertices into K classes (a priori or at random).Class memberships c = (c1, . . . , cNV )
P(edge between vertices i and j) = bcicj
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Stochastic Block Model
Idea: Split the vertices into groups, and consider all vertices in agiven group stochastically equivalent.
Parameters: NV ,K ∈ N, B ∈ RK×K
Split the NV vertices into K classes (a priori or at random).Class memberships c = (c1, . . . , cNV )
P(edge between vertices i and j) = bcicj
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Stochastic Block Model
Idea: Split the vertices into groups, and consider all vertices in agiven group stochastically equivalent.
Parameters: NV ,K ∈ N, B ∈ RK×K
Split the NV vertices into K classes (a priori or at random).Class memberships c = (c1, . . . , cNV )
P(edge between vertices i and j) = bcicj
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Stochastic Block Model - Example
NV = 35
K = 3
B =
0.9 0.1 0.10.1 0.3 0.050.1 0.05 0.3
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Measures of Vertex Centrality
Figure: Betweenness Centrality
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Measures of Vertex Centrality
Figure: Closeness Centrality
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Multiple Network Models
Single network models are in general not suitable for modellingmultiple network observations, e.g. brain scans.
Figure: Brain Networks[4]
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Multiple Network Models
Aim: Model multiple noisy realisations of a single “true” network,i.e. observations of the form
True Network + Noise
For a binary network, noise can only manifest itself in the form offalse positive and false negative observations
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Multiple Network Models
Aim: Model multiple noisy realisations of a single “true” network,i.e. observations of the form
True Network + Noise
For a binary network, noise can only manifest itself in the form offalse positive and false negative observations
James Boyle supervised by George Bolt Statistical Analysis of Network Data
The Measurement Error Model
Model Assumptions[3]1:
True network A ∼ StochasticBlockModel(NV ,K ,B)
Observation noise A(1), . . . ,A(n) respects the block structure
Concretely, letting P,Q ∈ RK×K be the matrices of false positiveand false negative rates respectively, we suppose that
A(m)ij ∼
{Bernoulli(Pcicj ) if Aij = 0
Bernoulli(1− Qcicj ) if Aij = 1
1C. M. Le, K. Levin, E. Levina, et al. Estimating a network from multiplenoisy realizations.Electronic Journal of Statistics, 12(2):4697–4740, 2018
James Boyle supervised by George Bolt Statistical Analysis of Network Data
The Measurement Error Model
Model Assumptions[3]1:
True network A ∼ StochasticBlockModel(NV ,K ,B)Observation noise A(1), . . . ,A(n) respects the block structure
Concretely, letting P,Q ∈ RK×K be the matrices of false positiveand false negative rates respectively, we suppose that
A(m)ij ∼
{Bernoulli(Pcicj ) if Aij = 0
Bernoulli(1− Qcicj ) if Aij = 1
1C. M. Le, K. Levin, E. Levina, et al. Estimating a network from multiplenoisy realizations.Electronic Journal of Statistics, 12(2):4697–4740, 2018
James Boyle supervised by George Bolt Statistical Analysis of Network Data
The Measurement Error Model
Model Assumptions[3]1:
True network A ∼ StochasticBlockModel(NV ,K ,B)Observation noise A(1), . . . ,A(n) respects the block structure
Concretely, letting P,Q ∈ RK×K be the matrices of false positiveand false negative rates respectively, we suppose that
A(m)ij ∼
{Bernoulli(Pcicj ) if Aij = 0
Bernoulli(1− Qcicj ) if Aij = 1
1C. M. Le, K. Levin, E. Levina, et al. Estimating a network from multiplenoisy realizations.Electronic Journal of Statistics, 12(2):4697–4740, 2018
James Boyle supervised by George Bolt Statistical Analysis of Network Data
The Measurement Error Model
James Boyle supervised by George Bolt Statistical Analysis of Network Data
The Measurement Error Model
James Boyle supervised by George Bolt Statistical Analysis of Network Data
The Measurement Error Model
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Metric Based Models
Idea[4]2: Assign probabilities to networks based on their distancefrom a central, “true”, network.
e.g. For a true network G true , the Spherical Network Modelassigns
P(G ;G true , γ) ∝ exp(−γd(G ,G true))
2Lunagomez S., Olhed, S. C., and Wolfe P. J. (2020). Modeling networkpopulations via graph distances.Journal of the American Statistical Association(just-accepted):1–59
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Metric Based Models
Idea[4]2: Assign probabilities to networks based on their distancefrom a central, “true”, network.
e.g. For a true network G true , the Spherical Network Modelassigns
P(G ;G true , γ) ∝ exp(−γd(G ,G true))
2Lunagomez S., Olhed, S. C., and Wolfe P. J. (2020). Modeling networkpopulations via graph distances.Journal of the American Statistical Association(just-accepted):1–59
James Boyle supervised by George Bolt Statistical Analysis of Network Data
On Things not Covered
Network Path data - Each data point is a path through a networke.g. vertex ↔ webpage
edge ↔ navigation by user between webpages
Inference - Very complicated, so approximate methods such asMCMCMLE or EMA must be used
Single Network Models
Dynamic networks
James Boyle supervised by George Bolt Statistical Analysis of Network Data
On Things not Covered
Network Path data - Each data point is a path through a networke.g. vertex ↔ webpage
edge ↔ navigation by user between webpages
Inference - Very complicated, so approximate methods such asMCMCMLE or EMA must be used
Single Network Models
Dynamic networks
James Boyle supervised by George Bolt Statistical Analysis of Network Data
On Things not Covered
Network Path data - Each data point is a path through a networke.g. vertex ↔ webpage
edge ↔ navigation by user between webpages
Inference - Very complicated, so approximate methods such asMCMCMLE or EMA must be used
Single Network Models
Dynamic networks
James Boyle supervised by George Bolt Statistical Analysis of Network Data
On Things not Covered
Network Path data - Each data point is a path through a networke.g. vertex ↔ webpage
edge ↔ navigation by user between webpages
Inference - Very complicated, so approximate methods such asMCMCMLE or EMA must be used
Single Network Models
Dynamic networks
James Boyle supervised by George Bolt Statistical Analysis of Network Data
Bibliography
B. Kim, K. H. Lee, L. Xue, and X. Niu.
A review of dynamic network models with latent variables.
Statistics surveys, 12:105, 2018.
E. D. Kolaczyk and G. Csárdi.
Statistical analysis of network data with R, volume 65.
Springer, 2014.
C. M. Le, K. Levin, E. Levina, et al.
Estimating a network from multiple noisy realizations.
Electronic Journal of Statistics, 12(2):4697–4740, 2018.
S. Lunagómez, S. C. Olhede, and P. J. Wolfe.
Modeling network populations via graph distances.
Journal of the American Statistical Association, (just-accepted):1–59,2020.
M. Salter-Townshend, A. White, I. Gollini, and T. Murphy.
Review of statistical network analysis: Models, algorithms and softwaresupplementary material.
2012.James Boyle supervised by George Bolt Statistical Analysis of Network Data