View
216
Download
3
Category
Tags:
Preview:
Citation preview
Networks & Digital Security
• Interdisciplinary
• Combination formal & ‘soft’ interpretation
• Security in the sense of a detective
04/18/23 Network Analysis 3
Overview
1. Primer on graph theory
2. Centrality
– Who is important?
3. Clustering
– Who belong together?
4. Detecting & predicting changes
– LIGA project
Central theme: global vs. local approaches
04/18/23 Network Analysis 4
Graph primer - basics
• V = vertices, N = |V|• A = arcs, M = |A|
6
),( AVG
2),( VAji
AyxyxA ),(),(
YyyxAyYxA ),(|),(
AyxYyXxyxYXarcs ),(|),(),(
(x points to y)
04/18/23 Network Analysis
Graph primer - concepts
• Neighborhood:
• Degree:
• Path: Similar concepts for undirected graphs G=(V,E)
04/18/23 Network Analysis 7
),()( xVAxAin
),()( VxAxAout
)()()( xAxAxA outin
),(),(),(),( yzpathzxAyxAyxpath z
)()( xAxd inin
)()( xAxd outout
)()( xAxd
Graph primer – graph types
04/18/23 Network Analysis 8
1.2.
3.
Models for these graphs by:
1. Erdős-Renyi (1959)2. Tsvetovat-Carley (2005)3. Barabási-Albert (1999)
Graph primer – degree distributions
• Erdős-Renyi: number of vertices N, each edge occurs with probability p
• Barabási-Albert: start with a small set of vertices and add new ones. Each new vertex is connected to others with a probability based on their degree
04/18/23 Network Analysis 9
!)1()(
k
ezpp
k
NkP
zkkNk
kckP .)(
Degree distributions: what is the chance a node has degree k?
N
kxdkP Vx
)(#)(
Poisson
Power-law (scale-free)
Graph primer – small world effect
• Famous experiment by Milgram (1967)
• Everyone on the world is connected to everyone else in at most 6 steps
• Social graphs exhibit the ‘small world effect’: the diameter of a social graph scales logarithmically with N
04/18/23 Network Analysis 10
Centrality
04/18/23 12Network Analysis
• Importance, control of flow
• Ranking of most important (control) to least important (control)
Node centrality measures 2/4
04/18/23 Network Analysis 14
vVu
C uvdvc
\
),(
1)(– Closeness
• ETA of flow to v
cC inverted for visualization
Node centrality measures 3/4
04/18/23 Network Analysis 15
)(
)(1
)(vAu
EE ucvc
– Eigenvector
• Influence or risk
Node centrality measures 4/4
04/18/23 Network Analysis 16
Vvs Vsvt st
stB
vvc
)(
)( stst SP– Betweenness
• Volume of flow/traffic
Obtaining cB
• Fastest current algorithm by Brandes in O(nm)
• Solves all shortest paths in one pass
– For each vertex, consider all d=1 nearest neighbors, then d=2 and so on
– For each shortest path, store which vertices are on it
– Derive cB
04/18/23 Network Analysis 17
Local approach
• No known algorithms calculate cB(v) faster than cB(v) for all v!
• We only want to rank nodes of interest, not all
• Local approach
– Find cB for some specific nodes
– If we can estimate cB, we can rank relevant nodes
04/18/23 18Network Analysis
Ego betweenness
04/18/23 19Network Analysis
)()( vAvvego
00001
00101
01011
00101
11110
A
• Ego-net: and corresponding edges
• Calculate cB considering only ego(v)
• Let A be the adjacency matrix:
11110
12121
11312
12121
01214
2A
*****
1****
1****
12***
*****
]1[2 AA
5.3)( 11
11
11
21 vcEB
No direct link between cB and cEB
04/18/23 Network Analysis 20
nvcEB )(
npvcB )(
Red circles + ego form a n+1 node star
Green triangles form an p node complete graph Kp
)1()( 21 nnvcEB
Red circles + ego form a p+1 node star
Green triangles + ego form an n node complete graph Kn
Correlation cB and cEB
• Very strong positive correlation!
04/18/23 Network Analysis 21
N=25 0,9510,041
[0,833-1,000]N=50 0,942
0,038[0,790-0,992]
N=100 0,940,033
[0,699-0,991]N=200 0,941
0,026[0,851-0,986]
SF graph p=0.1 p=0.2 p=0.3 p=0.4 p=0.5 p=0.6N=25 0,907 0,867 0,792 0,736 0,776 0,752
0,048 0,068 0,109 0,132 0,099 0,089[0,787-0,981] [0,655-0,972] [0,430-0,954] [0,283-0,923] [0,439-0,941] [0,369-0,907]
N=50 0,918 0,798 0,766 0,79 0,772 0,740,037 0,064 0,071 0,056 0,058 0,071
[0,768-0,983] [0,490-0,932] [0,499-0,909] [0,622-0,900] [0,599-0,899] [0,503-0,872]N=100 0,895 0,758 0,812 0,812 0,786 0,733
0,031 0,047 0,035 0,036 0,038 0,051[0,778-0,961] [0,623-0,854] [0,687-0,893] [0,623-0,888] [0,679-0,873] [0,561-0,830]
N=200 0,827 0,806 0,844 0,828 0,792 0,7390,031 0,025 0,021 0,024 0,026 0,036
[0,745-0,898] [0,714-0,861] [0,791-0,900] [0,750-0,881] [0,727-0,852] [0,601-0,812]
Bernouilli graph
Types of clustering
• What is a cluster?
• Supervised vs. unsupervised
• Partitional vs. hierarchical
04/18/23 Network Analysis 23
Clustering quality – modularity
C1 C2 C3 C4
C1 18 5 2 4
C2 5 15 2 0
C3 2 2 19 1
C4 4 0 1 20
C1 C2 C3 C4
C1 0.18
0.05
0.02
0.04
C2 0.05
0.15
0.02
0.00
C3 0.02
0.02
0.19
0.01
C4 0.04
0.00
0.01
0.20
24Network Analysis04/18/23
Cluster adjacency matrix
i
iii aeQ 2
j
iji ea29.004.002.005.018.01 a
22.000.002.015.005.02 a
24.001.019.002.002.03 a
25.020.001.000.004.04 a
Cluster adjacency matrix E
46.0
25.020.024.019.022.014.029.018.0 2222
Q
Q
Newman & Girvan clustering algorithm
• Edges that are the most ‘between’ connect large parts of the graph
1. Calculate edge betweenness Aij in n x n matrix A
2. Remove edge with highest score
3. Recalculate edge betweenness for affected edges
4. Goto 2 until no edges remain
• O(m2n), may be smaller on graphs with strong clustering
04/18/23 Network Analysis 25
Greedy clustering algorithm
• Maximize Q to find clustering
• Greedy approach:
• Creates a bottom-up dendogram
• Cut corresponding to maximum Q is optimal clustering
• Still a costly process, O(n2)
04/18/23 Network Analysis 26
C := V;repeat
(i,j) := argmax{∆Q|Ci, Cj ϵ C};C := C - Cj;Ci := Ci + Cj;
until |C| = 1
jiij aaeQ 2
Practical applications of social clusters
• Find people related to someone
• Find out if people belong to the same cluster
• This does not require a partitioning of the entire network!
04/18/23 Network Analysis 27
Local modularity
C
BU
C = collection nodes v ∈ V with known link structureU(C) = all nodes outside C to which nodes from C point: U(C) = {u ∈ V-C|A(C,u) ≠ ∅}B(C) = all nodes in C with at least one neighbor outside C: B(C) = {b ∈ C|A(b,U) ≠ ∅}
C: clusterU: universeB: boundary
28Network Analysis04/18/23
)),((
)),((
)),(()),((
)),(()(
VCBarcs
CCBarcs
UCBarcsCCBarcs
CCBarcsCR
Local cluster algorithm
C := Ø;v := v0;repeat
C := C+v;v := argmax{R(C+u)|u∈U(C)}
until |C| = k or R ≥ d
∆R(C,u) = R(C+u) – R(C)
Arcs removed from arcs(B(C),V)Arcs newly added to arcs(B(C),V)
Arcs removed from arcs(B(C),C)Arcs newly added to arcs(B(C),C)
∆R(C+v4) = 1/3 – 1/4 = 1/12
29Network Analysis04/18/23
Local cluster quality vs. global clusters
• For each node v in each global cluster i
– Find the local cluster with the same size
– Average
04/18/23 Network Analysis 32
iv
iviv GL
GLGLsim
),(
Preliminary results on real graphs
Network (size) Compiled by Sim(Lv,G
i)STD
Karate club (34) Zachary 0.75 0.24
Dolphin social relations (62) Lusseau 0.62 0.28
Les Miserables coappearance (75)
Knuth 0.58 0.29
American College Football (113)
Girvan & Newman 0.58 0.36
C. Elegans neural network (295)
Watts & Strogatz
04/18/23 Network Analysis 33
• Experiment too small for real conclusions, but
– edge vertices ruin the fun,
– edge betweenness?
• Usefulness of local approach depends on the seed node
Web graph
• ‘Social’ network of blogs and news sites
• Most graph models are static, but the Web is highly dynamic
• Stored copy is infeasible, continuous crawling intractable
• Change in relevance -> change in link structure
04/18/23 Network Analysis 35
Fully connected triad
(1 role)
Node roles
• Frequently recurring sub graphs: motifs
• Nodes share a role iff there is a permutation of nodes and edges that preserves motif structure
• On the Web:
04/18/23 Network Analysis 36
Uplinked mutual dyad
(2 roles)
Feedback with two mutual dyads
(2 roles)
Dynamic graphs
• Changes in relevance cause changes in link structure
• Changes in specific roles imply changes in other node roles
– Fanbase links to itself and their authorities
– Learning relevant links through affiliated sites
– etc.
• Relevance decays (half-life λ)
04/18/23 Network Analysis 37
LIGA research questions
• How to model (Web) node relevance ?
• How does acquired or lost relevance change linkage?
• How can we predict consequential changes?
• How can such prediction models be approximated by local incremental algorithms?
• A. m. o. ...
04/18/23 Network Analysis 38
Putting it together
• Networks can be analyzed using an array of tools
• Network analysis is useful in various disciplines:
– Information Retrieval
– Security
• But also in:
– Sociology
– (Statistical) physics
– Bioinformatics
– AI
04/18/23 Network Analysis 39
Most cited literature
• Centrality:– Borgatti S. P.: Centrality and Network Flow. Social Networks 27 (2005) 55-71
– Brandes U.: A Faster Algorithm for Betweenness Centrality. Journal of Mathematical Sociology 25(2) (2001) 163-177
– Freeman L. C.: A Set of Measures of Centrality Based on Betweennes. Sociometry 40 (1977) 35-41
• Clustering:– Clauset A.: Finding local community structure in networks. Physics Review E 72 (2005) 026132
– Girvan M., Newman M. E. J.: Community structure in social and biological networks. PNAS 99(12) (2002) 7821-7826
– Newman M. E. J.: Fast algorithm for detecting community structure in networks. Physics Review E 69 (2004) 066133
04/18/23 Network Analysis 40
Recommended