Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Exploratory data analysis and contiguity relations: An outlook
Giuseppe Giordano1, Gilbert Saporta2, Maria Prosperina Vitale1
1 Dept. of Economics and Statistics, University of Salerno, Italy{ggiordan; mvitale}@unisa.it
2 Conservatoire National des Arts et Métiers, Paris, [email protected]
London – December 13, 2015
Outline
Setting: Multidimensional Data Analysis (MDA) and Social Network Analysis (SNA)
Theoretical frameworks: Notion of contiguity, Homophily principle, Social Influence, ...
Aim 1: to present a framework for the treatment of SNA data structures with explorative techniques of MDA
Methods: Smooth factorial analysis-SFA; Factorial Analysis of Local Differences-FALD (PCA, MCA) Benali, H., Escofier, B. (1990
Aim 2: To define ad hoc relational data structures highlighting the effect ofexternal information on networks
Methods: Factorial Contiguity Maps and Auxiliary information in SNA Giordano G., Vitale M.P. (2007) (2011)
Illustrative example on Scientific Collaboration
Exploratory data analysis and contiguity relations: An outlook
General aims
use of Contiguity Analysis to synthesize and visualize the patterns of social relationships in a metric space
explore the effect of external information on relational data looking for groups of structurally equivalent actors obtained through clustering methods
illustrative example in the framework of scientific collaboration gives a major insight into the proposed strategy of Multidimensional Data Analysis in the framework of Social Network Analysis
Exploratory data analysis and contiguity relations: An outlook
Background and Aim
Background
- SNA focuses on ties among interacting units (Dyad, Triad, Subgroups)
to describe the pattern of the social relationships in a network
- the techniques of MDA consider statistical observations (at individual
level) to obtain syntheses of variables and units
Aim
to present a framework for the analysis of relational data and
attribute variables through the explorative techniques of
Multidimensional Data Analysis
Exploratory data analysis (EDA) is a detective work, finding and revealing the
clues, i.e. uncovering structures in the data. EDA uses numerical as well as
visual and graphical techniques to accomplish its aims. (Tukey, 1977)
Exploratory data analysis and contiguity relations: An outlook
Data Structures
Relational data (pairwise links joining two units)
=> SNA
Attribute data (qualitative or quantitative variables)
=> MDA
Two Perspectives (to put together the 2 data structures)
Multidimensional
Data Analysis - MDA
Social Network
Analysis - SNA
Exploratory data analysis and contiguity relations: An outlook
MDA and SNA: Data structures
Attribute Data Matrix
n1 n2 n3 . ni . . . ng
n1
n2
n3
.
ni
.
.
.
ng
E1 E2 … Ej … Eh
n1
n2
n3
.
ni
.
.
.
ng
Affiliation matrix: 2-mode network
Adjacency matrix: 1-mode network
Relational Data Matrix
MDA SNA
G
(np)(gg)
A(gh)
single set of actors
g = number of actors
Two sets: actors/events
h= number of events E
b1.
.bi..bq
a1… a
j... a
p
f ij
Contingency Table
F ➔
f . j
f i .
b1.
.bi..bq
a1… a
j... a
p
f ij
Contingency Table
F➔
f . j
f i .
Exploratory data analysis and contiguity relations: An outlook
MDA in SNA Usually different techniques of MDA have been used
to visualise and explore the relationships in the net
structure
- Multidimensional Scaling : representation of similarity or dissimilarity
measures among the actors onto a factorial map (Freeman, 2005)
- Canonical correlation: analysis of associations among actor
characteristics, (i.e. network composition) and the pattern of social
relationships (i.e. network structure) (Wasserman and Faust,
1989)
- Correspondence Analysis and Multiple Factor Analysis: analysis of 2-
mode networks (Roberts, 2000; de Nooy, 2003; Faust, 2005; D’Esposito et
al., 2014; Ragozini et al., 2015)
- Clustering techniques for network data
(Batagelj, Ferligoj, 1982, 2000)
such as…
Exploratory data analysis and contiguity relations: An outlook
Contiguity analysis is a generalization of linear discriminant analysis in
which the partition of elements is replaced by a more general
graph structure defined a priori of the set of the observations
(disjoint cliques, chain structures, undirected graph).
Contiguity analysis
(Lebart 1969, 2006; Lebart et al., 2000)
Answer
G (n,n) contiguity matrix holding the n vertices in a
contiguity graph (symmetric binary matrix); gii’ = 1 if i is
a neighbour of i’ and gii’ = 0 if not
How to take into account
relational data in MDA?
Exploratory data analysis and contiguity relations: An outlook
Aim 1
Contiguity analysis in MDA: SFA and FALD
Smooth factorial analysis - SFA
Analysis of the general pattern in the data by
removing local variations
(replacing each point with the centre of gravity
of its neighbors)Factorial Analysis of Local Differences - FALD
Analysis of the local variations
(replacing each point with the differences from
the barycentre of its neighbors)
Contiguity analysis in MDA
- Smooth factorial analysis – SFA
- Factorial Analysis of Local Differences - FALD
(Benali, Escofier, 1990)(Benali et al. 1990)
Exploratory data analysis and contiguity relations: An outlook
Quantitative Variables: Principal Component Analysis - PCA
Qualitative Variables: Multiple Correspondence Analysis - MCA
X (n,p) attribute matrix
information
of p characteristics on n statistical
units (vertices)
Q (n, k) = full disjunctive
coding matrix (0/1)
G (n,n) contiguity matrix (network structure)
N (n,n) diagonal matrix [N= diag (G’G)] holding the degree
of each vertices
SFA and FALD: matrices definition
Exploratory data analysis and contiguity relations: An outlook
SFA
FA
LD
GQN 1−
..
'
.ji.1
q
qqGQNQ +− −
Given the triplet Q, G, N for a MCA
multiply the Q matrix by N-1G
Analysis of relational data and auxiliary information
Exploratory data analysis and contiguity relations: An outlook
SFA
FA
LD
GXN 1−
GXNX 1−−
Analysis of relational data and auxiliary information
Given the triplet X, G, N for a PCA
multiply the X matrix by N-1G
Exploratory data analysis and contiguity relations: An outlook
Entries in adjacency matrix can be seen as a particular case of contiguity
relation among statistical units defined in G. It produces a fuzzy partitioning of
the units.
Decomposition of the total variance/inertia into two components:
- local variance between the adjacent units
- residual variance
SFA and FALD in SNA
SFA: variability explained by the
presence of a contiguity
structure
FALD: actors with a prominent
role in contiguous groups
to discover patterns in the data
analysis of cohesive sub-group variations
Exploratory data analysis and contiguity relations: An outlook
Real data set: Scientific collaboration among scientists
… the process generating ties (e.g. co-authorship) in a
collaboration network is somewhat affected by attribute
data on authors (academic position, research specialty,
geographical proximity … ) or on publications (type,
scientific relevance, …).
Expected results:
- Are authors with different academic positions in their institution
(phd. student, assistant professor, full professor) more likely to
collaborate in writing a publication than authors sharing the same
position?
- Are authors who work in the same research specialty (e.g. Social
statistics) more likely to collaborate in writing a publication than authors
from different specialties?
Exploratory data analysis and contiguity relations: An outlook
Illustrative Example
Co-authorship network in a
scientific community*
GfKl - CLADAG 2010
Co-authorship patterns in Statistics, focusing on academic statisticians in
Italy (792 grouped in 5 subfields, at March 2010):
- target population involved in a discipline which is not yet fully explored in
terms of its scientific collaboration (i.e., co-authorship) behaviour
- no unique bibliographic archive for collecting their publications
- interest to trace co-authorship relations in distinct data sources
Exploratory data analysis and contiguity relations: An outlook
* De Stefano D., Fuccella V., Vitale M.P., Zaccarin S. (2013). The use of different data sources
in the analysis of co-authorship networks and scientific performance, Social Networks, 35, pp.
370-381
Data source: Italian academic statisticians whit publications in WoS
archive
A = affiliation matrix 481 x 2289481 scientists, 2289 publications and the cells are = 1 if
authors (in rows) authored a paper (in columns)
Relational Data
Data Definition (categorical variables )
For authors
For publications:
X = matrix 481 x 33 columns expanded in dummy coding of the 3 + 5 + 4 categories:
Academic position: Assistant, Associate, full professor
Scientific sub-sector (italian classification): S01-Statistics; S02-
Statistics for experimental and technological research; S03- Economic
statistic; S04-Demography; S05-Social statistics
Geo-localization of university: North-west; North-East; Centre;
South Italy
Z = matrix 2 x 22892 rows expanded in 3 + 3 dummy coding
Number of authors per publication <4 Auth; 4-10 Auth; >10 Auth;
Year of publication (1989-96; 1997-03; 2004-10)
Exploratory data analysis and contiguity relations: An outlook
Exploratory data analysis and contiguity relations: An outlook
DATA PRE-TREATMENT
Since we are interested in co-authorship among italian scholars we cutout external link reducing the initial dataset to an affiliation matrix of
333 Authors and 526 Papers
Bipartite network
Papers
Authors
Component Size 3 4 5 6 7 8 11 13 14 16 19 23 25 29 552# of components 14 4 4 2 3 4 1 1 2 1 1 1 1 1 1
Authors x Authors projection (size: 333)
Component Size 2 3 4 5 6 7 8 197 # of components 20 5 5 5 1 2 2 1
Graph density = 0.013
Largest Component density = 0.025
Exploratory data analysis and contiguity relations: An outlook
Aim: to explore the association structure between
authors characteristics (attribute variables on
actors)
Analysis of Attribute Data in X(MCA):
Exploratory data analysis and contiguity relations: An outlook
-1 0 1 2 3
-2-1
01
2
CA factor map
Dim 1 (13.94%)
Dim
2 (
13
.36%
)
1
2
3
4
5
6
7
8
9
10
11
12 13
14
151617
18
1920
21
22
23
24
25
26
27 28
29
3031
3233
34
35
36
37
38
39
40
41
42
43
44
45
4647 484950
51
52
53
54
5556
57
58
59
60
61
62
63
64 65
66
67
68
6970
71
72
73
74
75
76
77
78
79
80
81
8283
84
85
86
87
88
89
90
91
92
93
94
95
96
9798
99
100
101102
103
104
105
106
107
108
109
110111
112
113
114
115
116
117
118
119120
121
122
123
124
125
126
127
128
129130131
132
133134
135
136
137
138
139
140
141142143
144
145 146
147
148
149
150
151
152153 154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172173
174
175
176177
178
179
180
181
182
183
184
185186187
188
189
190
191
192
193
194
195
196
197198
199
200
201202
203
204 205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227 228
229230
231
232
233
234
235
236
237
238
239
240
241
242
243244
245
246247
248
249
250
251
252
253
254255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278 279
280281282
283
284
285
286
287
288
289
290
291292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316317318
319
320
321322
323 324
325326
327
328
329
330
331
332333
Centre
NorthEast
NorthWest
SouthAssistantProf
AssociateProf
FullProf
SECS-S/01
SECS-S/02
SECS-S/03
SECS-S/04
SECS-S/05
-1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5
0.0
00
.05
0.1
00.1
50.2
00
.25
0.3
00.3
5
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
Dim 1 (22.08%)
Dim
2 (
21.1
6%
)
heig
ht
cluster 1
cluster 2
cluster 3
cluster 4
cluster 5
cluster 6
191267275
42546290147170210238270304329333
124182
33
1
68105111190194200208211288316326331
38528592108114116144177199204212227232241278292311176257
78103106231
418689125178221223234239252296301317318
29246
60
88215244
3254345555666121140148165193205216228279285291305308313
1516171920447173769399107166175179183189203206222237242251260266269286293295315319
273669981261321431491691811851871922092132402452612812822842983234
2235109150167220253
313479104254330
8321726363
118299
1264677095137139141142152153186195198202218236248255265272276280306320328
14214750748211311511912012213114515817217322923028330031232240102264307
23112127138163
174219303
732133134155184214247249259
87262297
46
5158
613286575778491971231541591621641711972012252332432582893023143248
48495772117135146156207226235271277287290309211265980256
81160268
100180
243753327
61110128188
157
161274
30
321130250
2739
19610
94325
96136224
101129151168
51839294310332
Hierarchical clustering on the factor map
-1 0 1 2
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
Factor map
Dim 1 (22.08%)
Dim
2 (
21
.16
%)
191267275
42546290147170210238270304329333
124182
33
1
68105111190194200208211288316326331
38528592108114116144177199204212227232241278292311176257
78103106231
418689125178221223234239252296301317318
29246
60
88215244
3254345555666121140148165193205216228279285291305308313
1516171920447173769399107166175179183189203206222237242251260266269286293295315319
273669981261321431491691811851871922092132402452612812822842983234
2235109150167220253
313479104254330
83217263
63
118299
1264677095137139141142152153186195198202218236248255265272276280306320328
14214750748211311511912012213114515817217322923028330031232240102264307
23112127138163
174219303
732133134155184214247249259
87262297
46
5158
613286575778491971231541591621641711972012252332432582893023143248
48495772117135146156207226235271277287290309211265980256
81160268
100180
243753327
61110128188
157
161274
30
321130250
2739
19610
94325
96136224
101129151168
51839294310332
cluster 1
cluster 2
cluster 3
cluster 4
cluster 5
cluster 6
Looking at association structure on attribute
variables: Smoothed Multiple Correspondence
Analysis - SFA
Exploratory data analysis and contiguity relations: An outlook
-1 0 1 2 3
-10
12
CA factor map
Dim 1 (17.19%)
Dim
2 (
15.6
2%
)
1
2
3
4
5
67
89
10
1112
13
14
151617
18
19
20
21
2223
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
4243
44
45
46
4748
4950
51
52
53
54
5556
57
58
59
60
61
62
6364 65
6667
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95 96
97
98
99
100 101102
103
104
105
106
107
108
109 110
111
112
113114115116
117
118
119
120121122
123
124
125
126
127
128
129
130
131
132
133134135
136
137
138
139
140
141142
143
144145 146147
148
149
150
151
152153154
155
156157158
159
160
161162
163
164165
166
167
168
169
170
171172
173
174
175
176
177
178179
180
181
182
183
184
185
186
187
188
189
190
191
192
193194
195
196
197198
199
200
201
202
203
204205
206
207208
209
210211212
213
214
215
216
217
218
219
220
221
222
223
224
225226
227228229
230
231
232
233
234
235236
237
238
239
240
241
242
243
244
245
246
247248249
250
251
252
253
254255
256
257
258
259
260261
262
263
264
265
266
267
268
269
270
271272
273274
275
276277
278 279
280
281282
283
284
285
286
287
288
289290
291
292
293
294
295
296
297
298
299
300
301
302
303
304305
306
307
308
309
310
311312
313
314
315
316
317318
319
320
321322
323
324325
326
327
328
329
330
331
332333
Centre
NorthEast
NorthWest
South
AssistantProf
AssociateProf
FullProf
SECS-S/01
SECS-S/02
SECS-S/03
SECS-S/04
SECS-S/05
Analysis of N-1GQ
The factorial map highlights the characteristics of
contiguous scholars. For instance the nodes are the
barycentre of their neighbours and lye close to
characteristics which are peculiar of their ego-centered
network (Sector, Role,..)
Cluster Analysis on the Results of the S-MCA
Exploratory data analysis and contiguity relations: An outlook
0.00
0.10
0.20
Hierarchical Clustering
inertia gain
215 88 257
176
244
174
138
267
327
188
263
217 83 29 58 219 51 60 46 118
241 33 275
303
299
191 1 61 110
184 37 24 9 195
273
274 66 131
128 53 130
196 10 250
139
295
222
119
240
223
173
154
193
146
205
304
316
153 95 165 70 285
115
230
112
308
148
320
306
255 47 12 55 265
248
272
328 64 276
198
280
218
186
216 56 142
113 74 120 67 279 43 141
145
283 25 21 50 137
300
158 82 152
322
202 14 312
313
236 3
305
140
228
121 45 291
122
229
246
330 63 79 172
254 31 34 27 104
307 40 163
253
127
150 22 264 35 167 23 102 26 220
103
109
231
182 78 124
268 81 256 59 262
333 87 160 8
297
129 39 80 101
224
136
151
310
294 18 168 5
332
180
100 98 30 161
157 96 48 235
226
134
214
133 7
247
249 6
324 32 309 49 271
258 28 197 84 135
287 97 2 11 302
290
277 65 156
243
225
162 13 75 314
164
159
233 72 57 289
201
325
321
171
207 94 155
117 38 92 123 68 208
105
147
326
108
170
106
199
329 54 90 270
116
227
331
238
210 42 62 288 85 212 52 311
292
278
211
204
194
190
177
144
111
114
166
266 44 16 20 237 17 183
209
187
175
181
132
260 73 293 15 315 93 99 107
245
169
143
269
189
149
319 71 203 4
179
178
317
296
221
206
125 86 41 76 234 89 69 185
298
239
281
252
213
261 36 282
318
242
126 77 91 251
323
259
286
284
200
301 19 232
1920.
000.
050.
100.
150.
200.
25
Click to cut the tree
-1 0 1 2
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
Factor map
Dim 1 (23.87%)
Dim
2 (
21.6
9%
)
299
241
176
78124182231
33
5490329
191
183
1162272704262210238
170106199
275
3318521252326
111114144177190194204211278292311288
88257
68
108
220
29
103
208
215
303
304
138
316
263
105
253
109
35
188
153
167127150
244
2226440
1402283053
165
232
308148121
163
267327
45
60
285
70
307
301
217
115
74120186
246
113
192
56142216
330
200
214314167
230
320
25555
112
1455028312
223
139
286
14312137280
25
234
306
63
147
79
218172
322
254
202
239
291
89
82152158300
47276198
417686125206221296317281
26532824827264
252178284282318
3134
31536
4295
119240
27
293
71
93
122
173
21326115
104
222
10724599
17914973
19
279
1852982036917518118720913226016616204426617183237
229
319
61
313236
269
95
51
189
174
333110
46118
23
169
143
66
323
92123
259
38
154
146205
131
12853
87
10226
251
193
7791126
184
242
9
58219
37
27327419524
325
8
48
258
160
268
171235
28
32
59262
321
256
297
81
4927172476309
133
98
324134214226249
19715923384314164135287265156277290302119772572892011375162225243
155207
117
30
94
180161
10250
157
130196
100
96
3980129 1011361512245332
18168294310
cluster 1
cluster 2
cluster 3
cluster 4
cluster 5
cluster 6
-1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
Dim 1 (23.87%)
Dim
2 (2
1.69
%)
heig
ht
cluster 1
cluster 2
cluster 3
cluster 4
cluster 5
cluster 6
299
241176
78124182231
33
5490329
191183
1162272704262210238170106199
275
3318521252326111114144177190194204211278292311288
88257
68108
220
29
103
208
215
303
304
138
316
263
105
253109
35
188
153
167127150
244
2226440
1402283053165
232
308148121
163
267327
45
60
28570
307
301
217
11574120186246
113
192
56142216330
200
214314167230
32025555
1121455028312
223139
286
1431213728025
234
30663
147
79218172322
254202
239
291
89
8215215830047276198
417686125206221296317281
26532824827264
2521782842823183134
315364295119240
272937193
122
173
21326115104222
1072459917914973
19
279
1852982036917518118720913226016616204426617183237
229
319
61313236
269
95
51
189
174
333110
46118
23
169143
66
323
92123
259
38154
146205131
12853
8710226
251
193
7791126
184
242
9
58219
3727327419524
3258
48258
160
268
17123528
32
59262
321
256297
81
492717247630913398
32413421422624919715923384314164135287265156277290302119772572892011375162225243
15520711730
94180161
10250
157
130196
10096
3980129 101136151224533218168294310
Hierarchical clustering on the factor map
FALD: Correspondence Analysis of
Exploratory data analysis and contiguity relations: An outlook
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
-0.5
0.0
0.5
1.0
CA factor map
Dim 1 (27.54%)
Dim
2 (
20
.95
%)
12
3
4
5
6
78
9
10
11
1213
14
151617
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
3334
35
3637
3839
40
4142
43
4445
46
47
48
49
5051
52
53
54
5556
57
58
59
60
61
62
63
64
6566
67
6869
7071
72
73
74
75
76
77
78
79
80
8182
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101 102
103
104105
106
107
108
109
110
111
112
113
114
115
116
117118
119
120
121
122
123
124
125
126
127
128
129130
131
132
133134
135
136
137
138
139
140141
142 143
144
145146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172173
174175
176
177
178
179
180 181
182
183
184
185
186
187
188
189
190
191
192
193194
195196
197 198
199
200201
202
203
204
205
206207
208
209
210211
212
213
214
215
216
217
218
219
220
221
222
223224
225
226
227
228
229
230
231
232
233
234
235
236237
238
239
240
241
242243
244
245
246247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265266
267
268
269
270
271
272
273
274
275276
277278
279
280
281282
283
284
285
286
287
288289
290
291
292
293
294
295
296297
298
299
300
301302
303304
305306307
308
309 310311312
313
314
315
316
317
318
319320
321
322
323
324
325
326
327
328329
330
331332333CentreNorthEastNorthWest
South
AssistantProf
AssociateProf
FullProf
SECS-S/01
SECS-S/02SECS-S/03
SECS-S/04
SECS-S/05
..
'
.ji.1
q
qqGQNQ +− −
Residualinformation afterleaving out the
contiguityrelationships
Attribute far from the origin
explain the residual variance:
They are notimportant in explaining a
network effect
CONCLUDING REMARKS – AIM 1
SFA and FALD can be used to explore the relationships
between the network structure and attribute variables…
The factorial maps show the author position as a function of
the attributes in Q and the contiguity structure…
How to combine Network Data and external information
in SNA?
Exploratory data analysis and contiguity relations: An outlook
Aim 2
AIM 2 -Notation and Definition
A (nxp) affiliation matrix (binary); n actors; p events
aij = 1 -< i-th actor is present at the j-th event
(i=1, …, n; j=1, …, p)
X (nxm) n actors; r nominal variables expanded into m
dummy variables
xik = 1 -< i-th actor belongs to the k-th category
(i=1, …, n; k=1, …, m)
Z(qxp) p events; s nominal variables expanded into q
dummy variables
zhj = 1 -< j-th event belong to the h-th category
(h=1, …, q; j=1, …,p)
Exploratory data analysis and contiguity relations: An outlook
Z E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12
Z11 1 1 1 1 0 0 0 0 0 0 0 0
Z12 0 0 0 0 1 1 1 1 0 0 0 0
Z13 0 0 0 0 0 0 0 0 1 1 1 1
A E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 X X11 X12 X13 X21 X22
1 1 1 1 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0
2 1 1 1 1 1 1 1 1 1 0 1 0 2 1 0 0 1 0
3 1 1 1 1 1 0 0 1 0 0 0 0 3 1 0 0 1 0
4 1 1 0 1 1 1 0 0 0 0 0 1 4 1 0 0 1 0
5 0 0 1 1 0 1 1 0 1 0 0 0 5 1 0 0 1 0
6 1 1 1 1 1 1 1 1 0 0 0 0 6 1 0 0 1 0
7 0 1 0 0 1 0 1 1 0 1 0 1 7 0 1 0 1 0
8 0 0 0 1 1 0 1 1 0 0 0 0 8 0 1 0 1 0
9 0 1 0 1 1 0 1 0 0 0 0 0 9 0 1 0 1 0
10 0 0 1 0 1 0 1 1 0 0 0 0 10 0 1 0 0 1
11 0 0 0 0 1 1 1 1 1 0 1 1 11 0 1 0 0 1
12 0 1 0 0 1 1 1 1 0 0 0 0 12 0 1 0 0 1
13 0 1 0 1 0 0 0 0 1 0 0 1 13 0 0 1 0 1
14 0 0 0 0 0 0 0 0 1 1 1 1 14 0 0 1 0 1
15 0 0 0 1 0 1 0 0 1 1 1 1 15 0 0 1 0 1
16 0 0 0 0 0 0 0 0 0 0 1 1 16 0 0 1 0 1
17 0 0 0 0 1 1 0 0 1 1 1 1 17 0 0 1 0 1
18 1 0 0 0 0 0 0 0 1 1 1 1 18 0 0 1 0 1
Data structure
Exploratory data analysis and contiguity relations: An outlook
GfKl - CLADAG 2010
ZCXBA +
... The Underlying Decomposition Model(Takane, Shibayama, 1991)
Affiliation data
Effect of actor characteristics
Effect of event
characteristics
Interaction Effect
+ XQZ
Exploratory data analysis and contiguity relations: An outlook
frequency lconditiona astion interpretalstatistica
)]([ˆ
:Xdesignorthogonalofcaseparticular
)(ˆ
1
1
2
AXXXB
AXXXB
XBA(B)
XBA
B
=
=
−=
−
−
diag
argminJ
bk,j [0,1]
=> Affiliation effect of the k-th category to the j-th event.
Effect of actor characteristics on relational data
Exploratory data analysis and contiguity relations: An outlook
GfKl - CLADAG 2010
Effect of events characteristics in determining
the presence of actors at events
AZZZC
CZA)J(
CZA
C
=
−=
−1
2
)]([ˆ
solution particular retain the we
diag
argminC
Chi affiliation effect due to the presence of actors at event
given the h-th event’s category
Exploratory data analysis and contiguity relations: An outlook
Use of the coefficients B and C in SNA
The B e C coefficients have been used to approximate
the original affiliation matrix A and to derive the
adjacency matrices GX and GZ (actors x actors)
GX and GZ are then analyzed by SNA methods specially to
look for peculiar patterns of ties and homogeneous
groups of actors induced by the effect of the external
information.
GfKl - CLADAG 2010
ZZ
XX
GAC
GAB
ˆˆ
ˆˆ
Exploratory data analysis and contiguity relations: An outlook
From A to G: Clustering of G
AAG =
Hierarchical clustering of network positions
Exploratory data analysis and contiguity relations: An outlook
Analysis of X and A
B (m x p)
column maximum value = 1
row marginal maximum theoretical value = r x p
BXAX
ˆˆ =
maximum value of the columns marginal
of B = r x m
(all n actors are present at the j-th event)
maximum value of rows marginal of B = p
(a category fully characterizes all p
events)
Proximity categories and actors
Exploratory data analysis and contiguity relations: An outlook
whose generic element weights the joint presence at events of actors having
similar characteristics. Maximum theoretical value in GX
If a homophily effect is present, then GX shows a weighting system coherent
with the capability of all actors characteristics to jointly explain the participation
in events.
XXX AAG = ˆˆ
2
1
2 rpr
p
j
==
Analysis of X and A
Clustering / Blockmodeling
Exploratory data analysis and contiguity relations: An outlook
Analysis of Z and A
C (n x q)
Results …
ZCA ˆˆ =Z
Proximity between Z categories and
actors
Exploratory data analysis and contiguity relations: An outlook
Analysis of Z and A
To derive from the weighted adjacency GZ:
where the generic element of GZ is the weight of the tie between a pair
of actors related to their presence at the same categories of events
ZZZ AAG = ˆˆZA
Clustering (or Blockmodeling)
Exploratory data analysis and contiguity relations: An outlook
Authors x Authors projection (size: 333)
Component Size 2 3 4 5 6 7 8 197 # of components 20 5 5 5 1 2 2 1
Graph density = 0.013
Largest Component density = 0.025
Exploratory data analysis and contiguity relations: An outlook
Authors Attributes in X and Coefficients B
Exploratory data analysis and contiguity relations: An outlook
North-west
North-east
Centre
South
Secs-S01
Secs-S02
Secs-S03
Secs-S04
Secs-S05
Assistant
Associate
Full professor
Analysis of Gx
XXX AAG = ˆˆ
Structural Equivalence - Cluster AnalysisK Centre NorthEast NorthWest South 1 0.23 0.27 0.23 0.27 2 0.00 1.00 0.00 0.00 3 0.12 0.12 0.00 0.764 0.35 0.18 0.25 0.22 5 1.00 0.00 0.00 0.00 6 0.00 1.00 0.00 0.00
k AssistantProf AssociateProf FullProf1 0.23 0.41 0.36 2 0.00 0.41 0.59 3 0.53 0.24 0.24 4 0.26 0.32 0.42 5 0.00 0.00 1.006 0.00 1.00 0.00
Exploratory data analysis and contiguity relations: An outlook
Analysis of Gx
Importance of Coefficients Bin 6 classes K S/01 S/02 S/03 S/04 S/05
1 0.00 0.00 0.00 1.00 0.002 0.92 0.00 0.08 0.00 0.003 0.00 1.00 0.00 0.00 0.004 0.36 0.00 0.42 0.00 0.225 0.97 0.03 0.00 0.00 0.006 0.00 0.22 0.78 0.00 0.00
Papers Attributes in Z and coefficients C
k 1989-96 1997-03 2004-10 Auth <4 Auth 4-10 Auth >10
1 0.24 0.09 0.08 0.07 0.27 0.122 0.05 0.03 0.03 0.03 0.06 0.113 0.08 0.07 0.05 0.05 0.06 0.26
Exploratory data analysis and contiguity relations: An outlook
Weighted importanceof coefficients C for the 3 classes
Analysis of Gz
ZZZ AAG = ˆˆ
Some concluding remarks
Relational and attribute data
- to derive ad hoc relational data structures (affiliation and adjacency matrices)
- to enhance the interpretation of traditional network analysis from a different
point of view:
i) the complementary use of valued graphs defined according to
observed auxiliary information;
ii) the possibility to introduce explicative measures joining external
information and relational data
iii) the interpretation of the results as complex data where groups of
actors are defined and interpreted as "second order"
individuals
Exploratory data analysis and contiguity relations: An outlook
Some references
Aluja, Banet T., Lebart, L. (1984). Local and Partial Principal Component Analysis and Correspondence Analysis. In: Havranek, T., Sidak, Z., Novak, M. (Eds.), COMPSTAT Proceedings, Phisyca-Verlag, Vienna. pp. 113-118.
Benali, H., Escofier, B. (1990). Analyse factorielle lissèe et analyse factorielle des diffèrences locales. Revue de Statistique Appliquèe. 38, pp. 55-76.
De Stefano D., Fuccella V., Vitale M.P., Zaccarin S. (2013). The use of different data sources in the analysis of co-authorship networks and scientific performance. Social Networks. 35, pp. 370-381.
D’Esposito, M. R., De Stefano, D., Ragozini, G. (2014). On the use of Multiple Correspondence Analysis to visually explore affiliation networks. Social Networks. 38. pp. 28–40.
Ferligoj, A., Kronegger, L. (2009). Clustering of Attribute and/or Relational Data. Metodoloski Zvezki. 6, 135-153.
Giordano G., Vitale M.P. (2007). Factorial Contiguity Maps to Explore Relational Data Patterns. Statistica applicata. 19, pp. 297-306.
Giordano G., Vitale M.P. (2011). On the use of auxiliary information in Social Network Analysis. Advances in Data Analysis and Classification (ADAC). 5, pp. 95-112.
Lazarsfeld, P., Merton, R. K. (1954). Friendship as a Social Process: A Substantive and Methodological Analysis. In: Freedom and Control in Modern Society, Berger, M., Abel, T., H. Page, C. (eds.). Van Nostrand, New York, 18-66.
Maddala, G.S. (1991). A Perspective on the Use of Limited-Dependent and Qualitative Variables Models in Accounting Research. The Accounting Review . 66, pp. 788-807.
McPherson, M., Smith-Lovin, L., Cook., J. (2001). Birds of a Feather: Homophily in Social Networks. Annual Review of Sociology. 27, pp. 415-44.
Opsahl, T., Panzarasa, P. (2009). Clustering in weighted networks. Social Networks. 31, pp. 155-163
Ragozini, G., De Stefano, D., D’Esposito, M. R., (2015). Multiple factor analysis for time-varying two-mode networks. Network Science. 3, pp. 18-36.
Robins, G.L., Pattison, P., Kalish, Y., Lusher,D. (2007). An introduction to exponential random graph (p*) models for social networks. Social Networks. 29, pp. 173-191.
Takane Y. Shibayama T. (1991). Principal component analysis with external information on both subjects and variables, Psychometrika. 56, issue 1, pp. 97-120.
Exploratory data analysis and contiguity relations: An outlook