40
Network Analysis Max Hinne [email protected]

Network Analysis Max Hinne [email protected]. Social Networks 6/1/20152Network Analysis

  • View
    216

  • Download
    3

Embed Size (px)

Citation preview

Network Analysis

Max [email protected]

Social Networks

04/18/23 2Network Analysis

Networks & Digital Security

• Interdisciplinary

• Combination formal & ‘soft’ interpretation

• Security in the sense of a detective

04/18/23 Network Analysis 3

Overview

1. Primer on graph theory

2. Centrality

– Who is important?

3. Clustering

– Who belong together?

4. Detecting & predicting changes

– LIGA project

Central theme: global vs. local approaches

04/18/23 Network Analysis 4

GRAPH PRIMER

04/18/23 Network Analysis 5

Graph primer - basics

• V = vertices, N = |V|• A = arcs, M = |A|

6

),( AVG

2),( VAji

AyxyxA ),(),(

YyyxAyYxA ),(|),(

AyxYyXxyxYXarcs ),(|),(),(

(x points to y)

04/18/23 Network Analysis

Graph primer - concepts

• Neighborhood:

• Degree:

• Path: Similar concepts for undirected graphs G=(V,E)

04/18/23 Network Analysis 7

),()( xVAxAin

),()( VxAxAout

)()()( xAxAxA outin

),(),(),(),( yzpathzxAyxAyxpath z

)()( xAxd inin

)()( xAxd outout

)()( xAxd

Graph primer – graph types

04/18/23 Network Analysis 8

1.2.

3.

Models for these graphs by:

1. Erdős-Renyi (1959)2. Tsvetovat-Carley (2005)3. Barabási-Albert (1999)

Graph primer – degree distributions

• Erdős-Renyi: number of vertices N, each edge occurs with probability p

• Barabási-Albert: start with a small set of vertices and add new ones. Each new vertex is connected to others with a probability based on their degree

04/18/23 Network Analysis 9

!)1()(

k

ezpp

k

NkP

zkkNk

kckP .)(

Degree distributions: what is the chance a node has degree k?

N

kxdkP Vx

)(#)(

Poisson

Power-law (scale-free)

Graph primer – small world effect

• Famous experiment by Milgram (1967)

• Everyone on the world is connected to everyone else in at most 6 steps

• Social graphs exhibit the ‘small world effect’: the diameter of a social graph scales logarithmically with N

04/18/23 Network Analysis 10

CENTRALITY

04/18/23 Network Analysis 11

Centrality

04/18/23 12Network Analysis

• Importance, control of flow

• Ranking of most important (control) to least important (control)

Node centrality measures 1/4

04/18/23 Network Analysis 13

)()( vEvcD – Degree

• Immediate effect

Node centrality measures 2/4

04/18/23 Network Analysis 14

vVu

C uvdvc

\

),(

1)(– Closeness

• ETA of flow to v

cC inverted for visualization

Node centrality measures 3/4

04/18/23 Network Analysis 15

)(

)(1

)(vAu

EE ucvc

– Eigenvector

• Influence or risk

Node centrality measures 4/4

04/18/23 Network Analysis 16

Vvs Vsvt st

stB

vvc

)(

)( stst SP– Betweenness

• Volume of flow/traffic

Obtaining cB

• Fastest current algorithm by Brandes in O(nm)

• Solves all shortest paths in one pass

– For each vertex, consider all d=1 nearest neighbors, then d=2 and so on

– For each shortest path, store which vertices are on it

– Derive cB

04/18/23 Network Analysis 17

Local approach

• No known algorithms calculate cB(v) faster than cB(v) for all v!

• We only want to rank nodes of interest, not all

• Local approach

– Find cB for some specific nodes

– If we can estimate cB, we can rank relevant nodes

04/18/23 18Network Analysis

Ego betweenness

04/18/23 19Network Analysis

)()( vAvvego

00001

00101

01011

00101

11110

A

• Ego-net: and corresponding edges

• Calculate cB considering only ego(v)

• Let A be the adjacency matrix:

11110

12121

11312

12121

01214

2A

*****

1****

1****

12***

*****

]1[2 AA

5.3)( 11

11

11

21 vcEB

No direct link between cB and cEB

04/18/23 Network Analysis 20

nvcEB )(

npvcB )(

Red circles + ego form a n+1 node star

Green triangles form an p node complete graph Kp

)1()( 21 nnvcEB

Red circles + ego form a p+1 node star

Green triangles + ego form an n node complete graph Kn

Correlation cB and cEB

• Very strong positive correlation!

04/18/23 Network Analysis 21

N=25 0,9510,041

[0,833-1,000]N=50 0,942

0,038[0,790-0,992]

N=100 0,940,033

[0,699-0,991]N=200 0,941

0,026[0,851-0,986]

SF graph p=0.1 p=0.2 p=0.3 p=0.4 p=0.5 p=0.6N=25 0,907 0,867 0,792 0,736 0,776 0,752

0,048 0,068 0,109 0,132 0,099 0,089[0,787-0,981] [0,655-0,972] [0,430-0,954] [0,283-0,923] [0,439-0,941] [0,369-0,907]

N=50 0,918 0,798 0,766 0,79 0,772 0,740,037 0,064 0,071 0,056 0,058 0,071

[0,768-0,983] [0,490-0,932] [0,499-0,909] [0,622-0,900] [0,599-0,899] [0,503-0,872]N=100 0,895 0,758 0,812 0,812 0,786 0,733

0,031 0,047 0,035 0,036 0,038 0,051[0,778-0,961] [0,623-0,854] [0,687-0,893] [0,623-0,888] [0,679-0,873] [0,561-0,830]

N=200 0,827 0,806 0,844 0,828 0,792 0,7390,031 0,025 0,021 0,024 0,026 0,036

[0,745-0,898] [0,714-0,861] [0,791-0,900] [0,750-0,881] [0,727-0,852] [0,601-0,812]

Bernouilli graph

GRAPH CLUSTERING

04/18/23 Network Analysis 22

Types of clustering

• What is a cluster?

• Supervised vs. unsupervised

• Partitional vs. hierarchical

04/18/23 Network Analysis 23

Clustering quality – modularity

C1 C2 C3 C4

C1 18 5 2 4

C2 5 15 2 0

C3 2 2 19 1

C4 4 0 1 20

C1 C2 C3 C4

C1 0.18

0.05

0.02

0.04

C2 0.05

0.15

0.02

0.00

C3 0.02

0.02

0.19

0.01

C4 0.04

0.00

0.01

0.20

24Network Analysis04/18/23

Cluster adjacency matrix

i

iii aeQ 2

j

iji ea29.004.002.005.018.01 a

22.000.002.015.005.02 a

24.001.019.002.002.03 a

25.020.001.000.004.04 a

Cluster adjacency matrix E

46.0

25.020.024.019.022.014.029.018.0 2222

Q

Q

Newman & Girvan clustering algorithm

• Edges that are the most ‘between’ connect large parts of the graph

1. Calculate edge betweenness Aij in n x n matrix A

2. Remove edge with highest score

3. Recalculate edge betweenness for affected edges

4. Goto 2 until no edges remain

• O(m2n), may be smaller on graphs with strong clustering

04/18/23 Network Analysis 25

Greedy clustering algorithm

• Maximize Q to find clustering

• Greedy approach:

• Creates a bottom-up dendogram

• Cut corresponding to maximum Q is optimal clustering

• Still a costly process, O(n2)

04/18/23 Network Analysis 26

C := V;repeat

(i,j) := argmax{∆Q|Ci, Cj ϵ C};C := C - Cj;Ci := Ci + Cj;

until |C| = 1

jiij aaeQ 2

Practical applications of social clusters

• Find people related to someone

• Find out if people belong to the same cluster

• This does not require a partitioning of the entire network!

04/18/23 Network Analysis 27

Local modularity

C

BU

C = collection nodes v ∈ V with known link structureU(C) = all nodes outside C to which nodes from C point: U(C) = {u ∈ V-C|A(C,u) ≠ ∅}B(C) = all nodes in C with at least one neighbor outside C: B(C) = {b ∈ C|A(b,U) ≠ ∅}

C: clusterU: universeB: boundary

28Network Analysis04/18/23

)),((

)),((

)),(()),((

)),(()(

VCBarcs

CCBarcs

UCBarcsCCBarcs

CCBarcsCR

Local cluster algorithm

C := Ø;v := v0;repeat

C := C+v;v := argmax{R(C+u)|u∈U(C)}

until |C| = k or R ≥ d

∆R(C,u) = R(C+u) – R(C)

Arcs removed from arcs(B(C),V)Arcs newly added to arcs(B(C),V)

Arcs removed from arcs(B(C),C)Arcs newly added to arcs(B(C),C)

∆R(C+v4) = 1/3 – 1/4 = 1/12

29Network Analysis04/18/23

Example 1 on Zachary’s Karate Club (d=0.65)

04/18/23 Network Analysis 30

Example 2 on Zachary’s Karate Club (d=0.65)

04/18/23 Network Analysis 31

Local cluster quality vs. global clusters

• For each node v in each global cluster i

– Find the local cluster with the same size

– Average

04/18/23 Network Analysis 32

iv

iviv GL

GLGLsim

),(

Preliminary results on real graphs

Network (size) Compiled by Sim(Lv,G

i)STD

Karate club (34) Zachary 0.75 0.24

Dolphin social relations (62) Lusseau 0.62 0.28

Les Miserables coappearance (75)

Knuth 0.58 0.29

American College Football (113)

Girvan & Newman 0.58 0.36

C. Elegans neural network (295)

Watts & Strogatz

04/18/23 Network Analysis 33

• Experiment too small for real conclusions, but

– edge vertices ruin the fun,

– edge betweenness?

• Usefulness of local approach depends on the seed node

LOCAL INTELLIGENCE IN GLOBAL APPLICATIONS

LIGA

04/18/23 Network Analysis 34

Web graph

• ‘Social’ network of blogs and news sites

• Most graph models are static, but the Web is highly dynamic

• Stored copy is infeasible, continuous crawling intractable

• Change in relevance -> change in link structure

04/18/23 Network Analysis 35

Fully connected triad

(1 role)

Node roles

• Frequently recurring sub graphs: motifs

• Nodes share a role iff there is a permutation of nodes and edges that preserves motif structure

• On the Web:

04/18/23 Network Analysis 36

Uplinked mutual dyad

(2 roles)

Feedback with two mutual dyads

(2 roles)

Dynamic graphs

• Changes in relevance cause changes in link structure

• Changes in specific roles imply changes in other node roles

– Fanbase links to itself and their authorities

– Learning relevant links through affiliated sites

– etc.

• Relevance decays (half-life λ)

04/18/23 Network Analysis 37

LIGA research questions

• How to model (Web) node relevance ?

• How does acquired or lost relevance change linkage?

• How can we predict consequential changes?

• How can such prediction models be approximated by local incremental algorithms?

• A. m. o. ...

04/18/23 Network Analysis 38

Putting it together

• Networks can be analyzed using an array of tools

• Network analysis is useful in various disciplines:

– Information Retrieval

– Security

• But also in:

– Sociology

– (Statistical) physics

– Bioinformatics

– AI

04/18/23 Network Analysis 39

Most cited literature

• Centrality:– Borgatti S. P.: Centrality and Network Flow. Social Networks 27 (2005) 55-71

– Brandes U.: A Faster Algorithm for Betweenness Centrality. Journal of Mathematical Sociology 25(2) (2001) 163-177

– Freeman L. C.: A Set of Measures of Centrality Based on Betweennes. Sociometry 40 (1977) 35-41

• Clustering:– Clauset A.: Finding local community structure in networks. Physics Review E 72 (2005) 026132

– Girvan M., Newman M. E. J.: Community structure in social and biological networks. PNAS 99(12) (2002) 7821-7826

– Newman M. E. J.: Fast algorithm for detecting community structure in networks. Physics Review E 69 (2004) 066133

04/18/23 Network Analysis 40