Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
Workshop on Basic Network Methods
John SkvoretzDepartment of [email protected]
June 2011
Aims
• Introduce COM researchers to the basic concepts of social network analysis
• Orient participants to the SNA packages UCINET and NetDraw
• Describe methods relevant to the medical research community
2
Schedule
• The network perspective
• Navigating UCINET and NetDraw
• Data collection and entry
• Basic concepts of network analysis with examples using UCINET and NetDraw
• Workbook exercises, advanced topics, applications in the medical research community, other software packages for network analysisnetwork analysis
Warning …
3
Examples
Friendships among researchers• Academics interested in interdisciplinary research (Freeman and
Freeman 1979) – friendship– Discipline– Citations
1= sociology2= anthropology3= mathematics/statistics4= other
4
Contact between CTSA institutions
• Clinical & Translational Science Award institutions (Skvoretz 2009) – coattendance of institutional representatives at key function committee meetings of the CTSA Consortium– Cohort (color)( )– Tie strength (thickness of edge)
• Strong ties only
1C101C03
2C11
Contact between CTSA institutions
1C08
1C09
1C12
1C061C11
1C02
1C04
1C05
1C07
2C06 2C02
2C102C12
2C09
2C04
3C14
3C13
3C10
3C07
3C08
3C04
3C11
3C02
3C01
3C09
3C03
3C05
3C06
3C12
1C01
2C072C01
2C03
2C05
2C08
5
Friendship in high school
6
The network perspective
7
The network perspective
• Mainstream social science analysis (MSSA) vs. social network analysis (SNA)
• Theoretical and methodological principles of network analysis
MSSA vs SNA
• Mainstream social science analysis
– Focuses on case outcomes as a function of case attributes
• Predict remission of chemical dependency patients as a function of patient attributes including care received
• Predict adoption of evidence based practices by hospital as a function of their values and orientation
– Data are organized in cases by variables format
– Identify one column as the outcome to be explained by Identify one column as the outcome to be explained by values in the other columns (attributes)
8
MSSA vs SNA
• Social network analysis
– Shifts from attributes of cases to ties/relations between cases as explanatory factors
– Pairs/dyads of cases, not single cases, are the units of analysis
– Pairs/dyads interconnect to form networks
– Case outcomes as a function of
• the overall pattern of connection in the network • a case’s “position” in the overall pattern
SNA’s shift to relations between cases
• Theoretical consequences
– Structure matters – how groups are connected through networks makes a difference
– Position matters – how location in the pattern of connection determines a node’s opportunities and constraints
– Indirect connections matter – how your direct ties link you indirectly to strangers can have major impact
9
SNA’s shift to relations between cases
• Methodological consequences –
– Data must be collected on ties as well as individual cases
– Each additional case = 1 more case but 2N more pairs of cases!
Relations between cases: measurement
From: Borgatti, S.B., A. Mehra, D.J. Brass & G. Labianca. 2009. “Network Analysis in the Social
Sciences.” Science 323:892-5.
10
• Group cohesion
• Density
Properties of interest
• Path lengths
• Clustering and subgrouping
• Homophily – background attributes and clustering
• Assortative mixing
• Importance in overall pattern of connection
• Centrality
Properties of interest
• Activity
• Distance to others
• Value as an intermediary – “bridgeness”
• Connected to well-connected others
11
• The local neighborhood
• Strength of ties
Properties of interest
• Reciprocated relations
• Closure vs structural holes – local clustering (or not)
• Acquaintance overlap
• Diversity of associates• Diversity of associates
• Organizations respond better to crisis when friendships cross department boundaries.
Hypotheses tested in the literature
• Degree centrality in a price fixing conspiracy network increases the likelihood of a guilty verdict.
• Firms with open collaboration networks (many structural holes) are less innovative.
• Adolescents with very large or very small friendship networks experience more depressive symptoms depending on their gender and the closure of their networkson their gender and the closure of their networks.
12
• Supply-chain managers with open networks of discussion partners had better ideas to improve supply chain management.
Hypotheses tested in the literature
• Knowledge of the work of a colleague without direct contact
• depends on the number of paths of length 2 to that colleague.
• almost never occurs if the shortest path is length 3 regardless of the number of such paths.
• Interethnic marriages occur at much lower than chance levels but more frequently in more ethnically diverse populations.
• Individual adoption of a health behavior spreads farther and faster in clustered lattice networks than corresponding random networks.
Hypotheses tested in the literature
13
Navigating UCINET and Navigating UCINET and NetDraw
UCINET shortcutsUCINET menusNetDraw shortcutsNetDraw shortcutsNet Draw menus
UCINET Main Screen: Shortcuts
Launch NetDraw
Launch Matrix Algebra
Display a dataset
Launch NotepadImport data via
Set defaultfolderDefault folder
pSpreadsheetSpreadsheet Editor
14
UCINET Main Screen: Menus
Read/Write Data
Look at data
Manipulate datasets
Reshape data
UCINET Main Screen: Menus
Collapse network based on some information
Transform network by changing cell values
based on some mathematical operation
Transform adjacency matrix into other types
15
UCINET Main Screen: Menus
Factor analysis type analytical tools
Utilities for calculating correlations and summarizing
distributions
Utilities for viewing partitions and associations between
variables
UCINET Main Screen: Menus
Finding subgroups and analyzing paths and density
Analyzing centrality and positional structures
Misc analyses of network ystructure
Two mode analyses
16
NetDraw Main Screen: Shortcuts
Set node shape by attribute value
Draw using various algorithms
Open attribute file
Set node color by attribute value
Open file
Open network file
NetDraw Main Screen: Menus
Drawing layout choices
Utilities to shape a drawing
17
NetDraw Main Screen: Menus
Subgroup analysis creates partition membership
attributes
Calculates centrality and other node-level properties and creates
associated nodal attributes
NetDraw Main Screen: Menus
Change properties of drawing elements
18
My first data set ...
19
20
21
22
4= close friend (fiend?)3= friend2= person I’ve met1= person I’ve heard of, but not met0= person unknown
23
24
25
26
27
28
29
30
31
32
1= sociology2= anthropology3= mathematics/statistics4= other
33
1= sociology2= anthropology3= mathematics/statistics4= other
1= sociology2= anthropology3= mathematics/statistics4= other
34
Data collection and entry
35
• Types of network data
• Complete – all ties linking elements of a closed
Data collection
population
• Ego network – set of ties surrounding a sampled unit
• One mode vs two mode
• One mode – ties are between nodes that are the same type of entity (person to person, organization to organization)to organization)
• Two mode – ties are between nodes of two different types of entities (person to event, RCT to disease)
• Methods
• survey & questionnaires (focus of data quality studies)
Data collection
• archives, especially recently electronic records
• observation
• diaries
• experiments
36
• Basic issues
• Measure existing ties (behaviorist) or ties as perceived
Data collection
by actors in them (cognitive) – the dependent variable may matter (diffusion vs influence)
• Temporality – measure episodic contacts or routinized recurrent interactions – static bias, dynamic research must define when ties start, change, end
• Accuracy and reliability – precise description of ties composing a network (accuracy is the main concern) or composing a network (accuracy is the main concern) or indicators of conceptual variables (validity and reliability are main concerns)
• Design considerations
• complete required compound or indirect linkages
Data collection
important
• ego network ok if focus is on actor outcomes
• boundary specification – very important because omission is a big problem
• realist uses perception of actors, nominalist uses definition by observer (researcher)definition by observer (researcher)
• membership criteria of organization, social tie tracing in snowball sample, participation in set of events
• for ego networks, defined by name generator
37
• Design considerations
• Sampling
Data collection
• not relevant for complete network studies
• random for ego networks means generalizations about egos can be made but not about dyads
• from a network – usable only to estimate some properties (like density or contact between subgroups)subgroups)
• Data sources
• Surveys and questionnaires self report
Data collection
• unaided recall
• complete roster
• dichotomous indicators vs intensity judgments
• name generators
• name interpreters
38
• Data sources
• s&q self report
Data collection
• contacts with types of people (do you know a plumber?)
• ties between organizations from an informant
• archives – interlocking directorates, citations, trade, electronic records – big question is how such indirect indicators correspond to more direct indicators of indicators correspond to more direct indicators of interaction
• experiments – the small world
• Lessons learned
• do not constrain number of alters reported
Data collection
• roster provides more complete coverage than recall
• recall gets stronger ties
• can not give useful detail on detailed episodes or timing of interaction but good at general picture
• name interpreter data good for observable attributes • name interpreter data good for observable attributes, poor for attitudes/unobservables
• data on broad features of ties (duration, frequency) are good
39
• Examples of instruments
• General Social Survey GSS
Data collection
• American National Election Studies ANES
• UC, Davis CTSA Community Engagement Survey UCD
• SelectSurvey SS
• Web of Science search WoS
• CTSA Key Function Committee minutes KFC
• Options
• DL files – text files, various formats available, easily created by any text editor
Getting data into UCINET
• VNA files – text files, native format for NETDRAW, enables combining both tie data and attribute data of nodes in a single file
• Data in native formats of other network software (Pajek)
• Raw data• Raw data
• Excel files
40
dl n=20format=fullmatrixdata:0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
Common DL formats
0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 00 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 00 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 00 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 00 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 00 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0
dl n=28data:0 0 1 6 0 0 0 2 1 2 0 0 0 6 1 0 0 1 0 4 0 0 2 0 0 6 0 21 0 2 1 1 0 0 0 1 0 0 1 1 0 2 0 0 3 1 0 0 0 1 0 2 1 0 30 0 0 12 0 0 0 6 2 4 1 2 1 3 6 6 1 5 3 1 0 6 0 5 2 0 0 10 0 1 0 3 1 0 2 0 2 0 0 4 0 0 7 0 1 6 0 3 1 0 1 1 1 0 20 0 1 0 0 4 0 6 2 2 2 0 0 0 6 2 0 7 1 0 1 3 2 1 0 7 7 2
Common DL formats
0 0 0 0 0 0 2 7 1 5 8 3 1 1 0 1 1 2 1 0 2 1 3 0 0 3 4 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 2 2 9 2 3 3 0 0 1 4 1 1 1 1 1 0 0 5 4 5 0 0 0 0 0 0 0 0 0 1 0 2 2 0 0 2 0 2 0 0 0 1 0 1 1 0 0 10 0 0 0 0 0 0 0 0 0 2 1 2 0 2 0 1 3 1 0 0 0 4 1 0 0 1 20 0 0 0 0 1 0 0 0 0 0 1 1 0 2 2 0 3 3 0 1 1 1 1 0 1 4 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 1 6 0 1 1 1 2 4 0 0 3 1 0 20 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 3 1 0 2 3 1 1 1 1 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 0 1 0 00 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 1 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 5 0 2 0 1 4 6 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 5 0 2 0 1 4 6 10 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 2 1 0 1 0 1 1 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 1 00 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 4 0 0 0 0 1 2 5 2 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 4 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 7 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
41
DLN=17FORMAT = FULLMATRIX DIAGONAL PRESENTROW LABELS:1234567
Common DL formats
891011121314151617COLUMN LABELS:12345678891011121314151617DATA:0 1 1 0 0 0 1 1 0 1 0 0 0 0 1 0 01 0 1 0 0 0 0 1 0 1 1 0 0 0 1 0 00 1 0 0 0 0 1 1 0 1 0 0 0 1 0 1 00 0 1 0 0 0 0 1 0 1 1 0 0 1 1 0 00 0 1 0 0 0 1 0 0 1 0 1 0 0 1 1 00 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 0…
dl nr=38 nc=5row labels embeddedcol labels embeddeddata:E20070129 E20070411 E20071113 E20080208 E20081108 Columbia 1 1 1 5 2Duke 1 0 0 1 1
Common DL formats
Mayo 1 0 1 1 1OHSU 2 1 1 2 1Rockefeller 1 1 1 1 1UCDavis 0 1 1 1 1UCSF 1 1 1 1 1UPenn 1 1 0 1 0Pittsburgh 1 1 2 2 1Rochester 1 1 0 2 1UTHSC 1 1 1 1 1Yale 1 1 0 2 2CaseWestern 0 0 2 1 0Emory 0 0 2 3 2J h H ki 0 0 1 3 2JohnsHopkins 0 0 1 3 2Chicago 0 0 1 1 1Iowa 0 0 1 1 2UMichigan 0 0 1 0 0Dallas 0 0 0 3 1UWashington 0 0 2 3 1UW-Madison 0 0 1 2 2Vanderbilt 0 0 1 1 0...
42
dl n = 43 format = edgelist1labels:JohnSkvoretz,SteveBorgatti,DaliaColon,JamesCavendish,AnnaMarieKoehler-Shepley,HarisMemic,ElisaBellotti,TracyBurkett,IlanTalmud,AndySnider,JayA'Hern,KatrienCleemput,GuidoConaldi,DavidLazer,Pooya????,ElizabethVaquera,ElsaOntiveros,Anne-MarieNiekamp,TonyImhof,RebeccaThys,ToreOpsahl,GeertjanVries,LukasZenk,AdrienneK
Common DL formats
p, y , y , p , j , ,insella,NadineKegen,JuliaBrennecke,BrianaHall,JaneFountain,LuisLoredo,FilipAgneessens,BruceCochrane,GretchenKoehler,JonathanSkvoretz,JosRitter,BobbyBrame,JulieVinup,LeaEllwardt,JosieMcLeod,ThomasFriemel,JenniferStortz,MistySkvoretz,IreneTroy,BenjaminElbirtlabels embedded:data:JohnSkvoretz,SteveBorgattiJohnSkvoretz,DaliaColonJohnSkvoretz,JamesCavendishJohnSkvoretz,AnnaMarieKoehler-ShepleyJohnSkvoretz,HarisMemicJohnSkvoretz,TracyBurkettJ h Sk t Il T l dJohnSkvoretz,IlanTalmud…SteveBorgatti,ElisaBellottiSteveBorgatti,IlanTalmudSteveBorgatti,DavidLazerSteveBorgatti,BenjaminElbirtJamesCavendish,ElizabethVaqueraBrianaHall,AnnaMarieKoehler-ShepleyGretchenKoehler,AnnaMarieKoehler-ShepleyJosRitter,AnnaMarieKoehler-ShepleyKatrienCleemput,HarisMemic…
dl n = 73, format = nodelistlabels:1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73
Common DL formats
, , , , ,data:1 14 15 21 54 55 2 21 223 9 154 5 18 19 435 19 436 13 20 227 178 14 179 12 20 21 22 51
11 19 50 52 5312 20 21 2213 17 20 21 2213 17 20 21 2214 21 2215 2016 18 41 4317 7 818 11 16 1919 4 11 16 18 2720 6 12 21 22 3821 22 51 54 5522 20 21 38 5123 40 43 50 52 53 60 62 65 68…
43
• Can combine in one file both nodal attribute data and network relational data
• Values of attributes can be text, unlike UCINET in which l t b b h UCINET i t fil
VNA files
values must be numbers – when UCINET imports a vna file it converts text values to numerical values
• Example 1
• Example 2
• Copy and paste into spreadsheet utility
• Example
Excel files
44
• For small networks or small attribute files, use spreadsheet utility or Excel
• If keeping text as values of attributes is important,
Advice
create a vna file with just node data in it – to use it in UCINET analysis, export from NetDraw
• If network is large, create edgelist or nodelist DL file for the network
• Medical innovation study
• Codebook• Data
Examples
• UCINET file(networks), file(attributes)
• RCT study
• Data
• Workshop study
Raw data• Raw data
45
Basic concepts: graphs and Basic concepts: graphs and matrices
Basics
• Graph – a set of items called vertices/nodes with undirected connections between them, called edges
• Digraph – a set of vertices/nodes with directed connections between them, called arcs
• More complicated types
– More than one type of connection
– More than one type of vertex (two mode networks)
– Edges/arcs may carry weights
– Multiple timepoints
46
Basics
• Sociogram – a graph or digraph representing the ties among individuals in a population
• Sociomatrix – a square table or matrix representing the location of ties between individuals in a population
• One row per person, one column per person, rows ordered from top to bottom and columns from left to right in the same order
E t i th ith d th jth l i th ti i di t • Entry in the ith row and the jth column xij is the tie indicator for the i,j pair – in the simple case 1 means a tie is present 0 means it is absent
Basics
sociogram
47
Basics
sociomatrix
Basics
2C06 2C02
2C07
2C01
2C10
2C12
2C032C05
2C09
2C11
2C08
2C04
3C14
3C07
3C08
3C113C09
3C12
1C08
1C091C10
1C11
1C05
2C01 2C11
3C133C10
3C04
3C11
3C02
3C013C03
3C053C06
sociogram
1C01C08
1C12
1C06
1C03
1C11
1C02
1C041C07
48
Basics
sociomatrix
Basics
• Adjacency – node i is adjacent to node j if xij = 1
• Dyad – a pair of nodes and the possible ties among themDyad a pair of nodes and the possible ties among them
• In graphs:
• In digraphs:
• Triad – a triple of nodes and the possible ties among them
• In digraphs, there are 16 different triad types
49
Basics
• Path – a sequence of adjacent nodes in which all nodes and edges are distinct
• path length is the number of edges or arcs (simple paths ignore direction of arcs)
• Geodesic – the shortest path between two nodes
• Component – a set of vertices such that a path exists between any two nodes in the set
• a graph with more than one component is disconnected• isolates are components
Basics
• Cutpoint – a node that if removed would increase the number of components
• Bridge – a tie that if removed would increase the number of components
• Local bridge – a tie that if removed would increase the length of the shortest path connecting the two nodes to at least length three
50
Basics
Bridge
Cutpoint
Local Bridge
Basic concepts: simple Basic concepts: simple network-level properties
51
Network-level properties
• Density (Δ) – ratio of number of edges (arcs) present to the maximum number possible
( )1N N• Max edges =
• Max arcs =
• Often used as a measure of cohesion
• Average number of ties per person =
( )12
N N −
( )1N N −
( )1NΔ −g p p
• In large populations, density must be small, otherwise average number of ties is huge
( )
Network-level properties
• Reachability or connectivity– proportion of pairs connected by a path of finite length
• Fragmentation – proportion of pairs not connected by a path of finite length
• Average geodesic – average length of shortest path among connected pairs
52
Network-level properties
Max edges = 32x31/2 = 496Edges present = 43
Density = 43/496 = 0.087
Ave edges per node = 2.69Reachability = 0.61
Fragmentation = 0.39Ave geodesic = 3.04
Network-level properties
53
Network-level properties
Routine counts arcs as ties and each edge equals 2 arcs, therefore 2 ties.
Network-level properties
54
Network-level properties
Network-level properties
55
Network-level properties
Basic concepts: node level Basic concepts: node-level properties
56
Node-level properties
• Key attribute of the “position” of a node in the overall pattern of connection – its importance or centrality
• Degree• Closeness• Betweenness• EV centrality
• The “position” of a node in its local neighborhood
• Closure/clustering among contacts
Node level properties: Node-level properties: centrality
57
Node-level properties: centrality
• Who is more important, more central?
• Participants in a summer methods camp by gender (color), role (shape) – at least one chooses the other as being in his/her top 3 “had most contact with”
Node-level properties: centrality
• Important means being involved, being active
• Number of direct ties or degree – degree centralityNumber of direct ties or degree degree centrality
• More active nodes are more important nodes
( )
( )
+=
= = =
′ =−
∑1
raw:
normed: 1
N
D i i ijj
iD
C i d x x
dC i
N
• Correlates with opportunity to directly influence and be influenced, visibility, exposure to network flows
58
Node-level properties: centrality
• Important means being close to others, having short paths to them if not direct connections – closeness centrality
1
• More important nodes have short paths to many others
• Correlates with ability to reach all others (as sender), be
( )
( ) ( ) ( )=
=
′ = −
∑1
1raw: =geodesic distance from to
normed: 1
FC ijN
ijj
FC FC
C i g i jg
C i N C i
y ( ),reached by all others (as receiver)
Node-level properties: centrality
• Important means being on the shortest paths between pairs of others – betweenness centrality
( )g i
• More important nodes are on the shortest paths between many pairs of others
( ) ( )
( ) ( )( ) ( )
< ≠
=
′ =− −
∑raw:
2normed:
1 2
jkB
j k i jk
BB
g iC i
g
C iC i
N N
• Correlates with opportunity to broker relations of others, control flows, have cosmopolitan viewpoint and access to diverse data
59
Node-level properties: centrality
• Important means being connected to others who are important – EV (eigenvector) centrality
• Important nodes are connected to other important nodes –it is not whom you know but whom those you know know
• Correlates with opportunity for indirect influence, “behind the scenes” power
Node-level properties: centrality
• Degree• Steve• Michael• Holly• Pam • Pauline
60
Node-level properties: centrality
• Closeness• Michael• Gery• Holly• John• Russ• Pauline
Node-level properties: centrality
• Betweenness• Gery• Michael
P li• Pauline• John• Holly
61
Node-level properties: centrality
• EV• Holly• Michael
H• Harry• Don• Pam
Node-level properties: centrality
62
Node-level properties: centrality
Node-level properties: centrality
63
Node-level properties: centrality
Node-level properties: centrality
• Measures are usually positively correlated so inconsistent profiles are especially interesting
Degree Closeness Betw’ness EV
Degree 1.000 0.629 0.626 0.625Closeness 1.000 0.839 0.630Betw’ness 1.000 0.289EV 1.000
64
Node level properties: Node-level properties: clustering
Node-level properties: clustering
• Opportunity vs constraint
• Bridging vs bonding social capitalBridging vs bonding social capital
• Social control and social support
Closure Closed Open
Structural Holes Few Many
65
Node-level properties: clustering
• Clustering coefficient – density of ties among ego’s alters– Effective size – Efficiency– Constraint
Clustering Coeff
1.000 0.267
Effective Size 3.000 4.667
Efficiency 0.600 0.778Clustering Coeff 0.900 0.100
Constraint 0.360Effective Size 1.333 4.600
Efficiency 0.267 0.920
Constraint 0.642 0.300
Node-level properties: clustering
• Clustering coefficient• Brazey• Lee
Bill• Bill• …• Pat
66
Node-level properties: clustering
Node-level properties: clustering
67
Node-level properties: clustering
• Aggregation to a network-level property – definitional to small worldness
– High average clustering – much higher than in random networks of similar size and density
– Short path lengths – on the order of lengths typical of random networks of similar size and density
Basic concepts: cohesive Basic concepts: cohesive subgroup identification
68
Cohesive subgroup identification
• Look for subgroups that “hang together”
– Important emergent phenomena such as “communities of Important emergent phenomena such as communities of practice”
– Interesting relationships to node attributes and characteristics (gender, scientific field)
– Effect on capacity for collective action by the group
– Locus of important social processes (influence, trust, social support)
Cohesive subgroup identification
• Of many methods proposed, consider two
– Direct connections are crucial – the Luce-Perry cliqueDirect connections are crucial the Luce Perry clique
• Emphasizes how nodes in a subgroup are directly connected to each other
– Ties or links with high “betweenness” are crucial – the boundaries between Girvan-Newman communities
• Emphasizes how nodes in a subgroup are indirectly connected to nodes p g p yin other subgroups
69
Cohesive subgroup identification
• A clique is a maximally complete subgraph of three or more nodes
• All nodes are adjacent to one another and no other node is adjacent to all in the subgraph
• Stand alone connected dyads not considered cliques
• Very strict definition of cohesion
• Nodes may belong to more than one clique – cliques are not necessarily mutually exclusive subgroups but may not necessarily mutually exclusive subgroups, but may overlap
Cohesive subgroup identification
• Ten cliques – three of size 4; seven of size 3
70
Cohesive subgroup identification
Cohesive subgroup identification
71
Cohesive subgroup identification
Cohesive subgroup identification
• Communities are subgraphs connected to other communities by high betweenness edges
• A high betweenness edge is on many short paths between pairs of nodes
• To identify communities, successively delete the edge with the highest betweenness score
• Recalculate scores, delete the highest edge and continue until target number communities achieved
• Yields non-overlapping mutually exclusive subgroups
72
Cohesive subgroup identification
• Communities with 2, 3, and 4 clusters
Cohesive subgroup identification
73
Cohesive subgroup identification
Cohesive subgroup identification
• Cliques
• Sometimes too many and too overlapping – analysis of overlap useful
• Very few in sparse graphs yet there may be regions of greater density
• No interesting substructure possible in a clique
• Communities
• Time consuming to compute for large graphs
• Identified groups may not be especially well connected
74
Basic concepts: position Basic concepts: position identification
Position identification
• Look for sets of nodes/persons who are connected to others in very similar/identical ways, regardless of their direct or indirect ties to one another
– Positions defined by such sets of persons
– Positions are emergent from the structure of relations
– Persons in the same position are “structurally equivalent”
– Important b/c positions occupied rather than subgroups belonged to can affect outcomes experienced by individuals
75
Position identification
• Three flavors of equivalence
– Regular – nodes are regularly equivalent if they are equally Regular nodes are regularly equivalent if they are equally tied to equivalent others
– Automorphic – nodes are automorphically equivalent if the only thing distinguishing them are their labels
– Structural – nodes are structurally equivalent if they have ties to exactly the same set of others
Position identification
• “Thinking” graph– Automorphic– StructuralStructural
76
Position identification
• Automorphic is very computationally intensive – useless on large graphs
• “Perfect” structural equivalence seldom found
• Calculate a measure of equivalence (Euclidean distance between choices given and choices received)
• Define a cutoff score below which pairs of nodes are considered s e
( ) ( )≠
⎡ ⎤= − + −⎢ ⎥⎣ ⎦∑2 2
,ij ik jk ki kj
k i j
d x x x x
considered s.e.
• Use cluster analysis to place nodes in s.e. clusters
Position identification
• Four position solution for CTSA centers– Cut = 94.5– Max = 214.5Max 214.5– Min = 9.8
77
Position identification
Position identification
78
Position identification
Basic concepts: Basic concepts: compositional effects
79
Compositional effects
• Exogenous attributes of nodes and network ties
– How are the ties between nodes related to exogenous How are the ties between nodes related to exogenous attributes of the nodes?
• Discipline in the case of researchers• Award cohort in the case of medical centers
– Homophily of ties – the extent to which ties are between nodes of similar background at greater than chance levels
– Contact diversity of person – the extent to which a person’s ties are to others of diverse background
Compositional effects
• Friendships and discipline
1 2 3 4
1 40 3 5 5
2 3 8 2 2
3 5 2 2 1
4 5 2 1 0
80
Compositional effects
• But persons vary
Discipline N Friends Homophilous
1 1 4 0.500
2 2 7 0.429
3 4 2 0.000
6 1 1 1.000
…
42 1 3 1.000
43 2 1 0.000
44 4 3 1.000
Compositional effects
• Measures for binary data
– E-I index varies from -1 (total homophily) to +1 (total E I index varies from 1 (total homophily) to +1 (total heterophily
– Proportion homophilous
– Heterogeneity of associates
−External Internal
Total
N NN
g y
qk(i) = the proportion of i’s associates in the kth
category of an attribute
( )− ∑ 21 kk
q i
81
Compositional effects
Compositional effects
82
Compositional effects
Compositional effects
83
Compositional effects
Compositional effects
84
• The network perspective
• Navigating UCINET and NetDraw
Summary
• Data collection and data entry
• Basic concepts
• Graphs and matrices• Simple network-level properties• Node-level properties: centrality• Node level properties: clustering• Node-level properties: clustering• Cohesive subgroup identification• Position identification• Compositional effects
Workbook exercises, advanced topics, applications in the medical research in the medical research community, other software packages for network analysis