Upload
alexander-pate
View
38
Download
1
Embed Size (px)
DESCRIPTION
Link Reconstruction from Partial Information. Gong Xiaofeng, Li Kun & C. H. Lai TSL@NUS. General situations where problems may arise. Observed network (A NxN filled with 0s and 1s) Scenarios: A) no side information. statistical analysis, clustering, modeling, process, etc. - PowerPoint PPT Presentation
Citation preview
Link Reconstruction from Partial Information
Gong Xiaofeng, Li Kun & C. H. LaiTSL@NUS
General situations where problems may arise
Observed network (ANxN filled with 0s and 1s) Scenarios:A) no side information. statistical analysis, clustering, modeling, process, etc.B) Some links are uncertain (positions known) link reconstruction problem, based on model, similarity
measure.C) Some 1s are set to be 0s (positions unknown) variant problem of link reconstruction, possible related to
link prediction.D) network is subject to change. one kind of prediction problem (link prediction), node
prediction, network evolution, etc.
B.1 Problem of network reconstruction
1
2
3
4
5
010?0
10?01
0?011
?010?
011?0
5
4
3
2
1
54321
Guess out the values (0 or 1) of dashed arrows.
There are some unknown links, which may be corrupted, missed or unable to measure at time.
Presumptions: o Network has structures.o Unknown links are fairly sampled.oNumber of unknown links are small.
B.2 Procedures of reconstruction of links
Available information -> fitted probabilistic model P(NxN)-> connection probability p(i,j) of each unknown links (i,j)-> determine a threshold of connection probability Pt-> set (i,j) to be 1, if p(i,j)>pt, and 0 otherwise
observed network
parameters
model function
optimizationconnection probability
threshold reconstruction or prediction
modelingprediction
B.3 Reformulated signal detection problem
Observed network -> 3 types of signals, 0, 1 and ?.Fitted model -> connection probabilities, P0 and P1.Signals (P?) to be classified -> ?
Problem: Giving connection probability P? -> type of signal (0 or 1)
Assumption under certain model:Unknown links do not influence significantly the reliability of fitted model (P0 and P1) , i.e., Connection probability P? of any unknown link can be regarded as be sampled from P0 or P1.
Searching an optimal detection scheme? e.g., Neyman-Pearson criterion,
Observation (data): connection probability (p)Hypothesis: H0: 0-link and H1: 1-link Data space E: R0 and R1, acceptance region
Decision D: D0 (accept H0) and D1 (accept H1)
B.4 An equivalent hypothesis testing problem
11
00
RpD
RpDD
1010 RRERR
md
mf
PHDPP
HDPPHDPP
1)(
)()(
11
1001
fyD PHDPHDP )(),(min 0110)(
B.5 Measuring reconstruction performance
actual valuepre
dictin
g o
utco
me
p np’ True Positive (TP) False Positive (FP) P’n’ False Negative (FN) True Negative (TN) N’
P N
Contingency table (or confusion matrix)
statistics defined: Sensitivity or True Positive Rate
(TPR):
TPR=TP/P=TP/(TP+FN)
False Positive Rate (FPR): FPR=FP/N=FP/(FP+TN)Accuracy (ACC): ACC=(TP+TN)/(P+N)
True Negative Rate or Specificity (SPC)
:SPC=TN/N=1-FPR
Positive Predictive Value (PPV): PPV=TP/(TP+FP)Receiver Operating Characteristic
(ROC):
TPR vs. FPR
B.6 Relation to performance measures
f0(p)
R4R3
R2R1
f1(p)
pt
)(
)(
)(
)(
1
4
3
2
10
RSTN
RSFP
RSFN
RSTP
if
connection probabilities
B.7 Criterion of MAP
)()(
)()({
011
100
pHPpHPD
pHPpHPDD
For reconstruction problem, we choose criterion to maximize the a posteriori probability of the two hypothesis.
MAPii
i cLHcP
HcP
cP
HcPcHP
)()(
)(
)(
)()(
1
0
0
1
A.1 Probabilistic model of structured networks
IIdd
eAobp
CMAwwfC
wwCCnkwCC
CMAC
wwwwk
ji
wwwwijij
ijijjiij
jiijkk
Tmkkkk
jiij
T
ji or
matrix adjancency ,matrix connection
attribute define node for
ij1
,)1(Pr
)()(
),(),2,1,(
),(
],,,[,
)()(
21
A.2 Estimate model parameters (MLE)
met are conditions stopping wheniterating cease )3
updateusly simultaneo )2
initial fromstart )1
onoptimizati basedgradient iterated
)()1(
)(
)1(
)(
)1ln()1(ln)Pr(ln
0ww01
0
11
,,
W
LWW
W
wwddp
pA
w
p
pp
pA
w
L
pApAwAL
jk
N
j kjjk
jkjkN
j k
jk
jkjk
jkjk
k
ijiijij
ijiijij
B.8 Example network
B.9 Density function of connection
probabilities
0 0.2 0.4 0.6 0.8 1
-0.01
0
0.01
0.02
0.03
0.04
0.05
Connection Probability (p)
Pro
babi
lity
dens
ity f
unct
ion
)
f1(p)
1/r f0(p)
B.10 MAP detector minimizes average error
1
0
0
11100
1100
0 110
0 01
001
101010
)(
)(0)()(
)())(1(min
)()(
)(1)()(
)()(
t
ttt
t
tt
p
pp
pf
pfpfpf
p
M
pFpFM
dppfHDP
dppfdppfHDP
HDPHDPM
t
t
t
Density function is usually jagged and difficult to work with. Distribution function is preferred. Consider the minimum average error (cost).
B.11 Distribution of connection
probabilities
0 0.2 0.4 0.6 0.8 1-0.5
0
0.5
1
1.5
2
Connection probability (p)
Sca
led
prob
abili
ty d
istr
ibut
ion
func
tion
F1(p)
1/r (1-F0(p))
F1(p)+1/r (1-F0(p))
B.12 Generalizability of algorithm
0 0.2 0.4 0.6 0.8 110
-6
10-5
10-4
10-3
10-2
10-1
100
Connection probability (p)
Pro
babi
lity
dens
ity f
unct
ion
)
F0(p)
F0m(p)
0 0.2 0.4 0.6 0.8 10
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Connection probability (p)
Pro
babi
lity
dens
ity f
unct
ion
)
F1(p)
F1m(p)
Unknowns following same distribution approximately?
Possible reasons for unfavorable burst at tail, source of model error.
B.13 Robustness of algorithm
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Connection probability (p)
Sca
led
prob
abili
ty d
istr
ibut
ion
func
tion
1/r (1-F0(p)) 5%
1/r (1-F0m(p)) 5%
1/r (1-F0(p)) 10%
1/r (1-F0m(p)) 10%
1/r (1-F0(p)) 15%
1/r (1-F0m(p)) 15%
1/r (1-F0(p)) 20%
1/r (1-F0m(p)) 20%
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
Connection Probability (p)
Pro
babi
lity
dist
ribu
tion
func
tion
F1(p) 5%
F1m(p) 5%
F1(p) 10%
F1m(p) 10%
F1(p) 15%
F1m(p) 15%
F1(p) 20%
F1m(p) 20%
sensitive to number of unknown links?
B.14 Comparison of operation points
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0
0.5
1
1.5
2
Connection probability (p)
Sca
led
prob
abilt
y di
stru
btio
n fu
nctio
n
F1(p)
1/r (1-F0(p))
F1m(p)
1/r (1-F1m(p)
F1(p)+1/r (1-F0(p))
F1m(p)+1/r (1-F0
m(p))
B.15 Reconstruction results
P N ACC (%) TP/P (%)TN/N (%)
TP/(TP+FP) (%)
201 5293 98.13 80.60 98.79 71.68
222 5272 98.13 80.63 98.86 74.90
192 5302 98.11 75.52 98.92 71.78
224 5270 98.25 80.80 98.99 77.35
235 5259 98.13 75.32 99.14 79.73
217 5277 98.38 78.34 99.20 80.19
204 5290 98.31 77.45 99.11 77.07
192 5302 98.25 71.88 99.21 76.67
231 5263 98.16 77.06 99.09 78.76
217 5277 97.93 71.89 99.00 74.64
213.5 5280.5 98.18 76.95 99.03 76.28
USAir Network, 10% missed
C.1 A variant problem of link reconstruction
Observed network -> types of signals, 0 and 1.
1
2
3
4
5
010?0
10?01
0?011
?010?
011?0
5
4
3
2
1
54321
0100
1001
0011
010
0110
0
0
0
00
0
5
4
3
2
1
54321
some 0s are originally 1s, but be set as 0s. position unknown, number known or unknown.
C.2 Procedures for the variant problem
Available information -> fitted probabilistic model P(NxN)-> connection probability p(i,j) of each 0-link (i,j)-> (a) number (M) unknown -> determine a threshold of connection probability Pt -> set (i,j) to be 1, if p(i,j)>pt, and 0 otherwise (b) number (M) known -> scoring: ranking connection probabilities of candidate links (all 0-links) -> set M links with highest score to be 1s.
C.3 Algorithm based on common neighbor
max/ nnpp ijjiij
0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
0.5
1
1.5
2
2.5
Connection Probability (p)
Scale
d d
istr
ibution f
unctions
F11/r (1-F0)F1+1/r (1-F0)
0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
0.5
1
1.5
Connection probability (p)S
cale
d d
istr
ibution f
unction
F11/r (1-F0)F1 + 1/r (1-F0)
C.4 Comparison between two methods
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-0.02
0
0.02
0.04
0.06
0.08
0.1
Connection probability (p)
Pro
babi
lity
dens
ity f
unct
ion
f1 common neighbors1/r f0 common neighborsf1 model1/r f0 model
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
-0.5
0
0.5
1
1.5
2
2.5
3
Connection probability (p)
Sca
led
dist
ribut
ion
func
tion
F1 common neighbors1/r (1-F0) common neighborsF1 model1/r (1-F0) model
Probability density functions Distribution functions
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
0.2
0.4
0.6
0.8
1
1.2
1.4
Connection probability (p)
Dis
trib
utio
n fu
nctio
n
common neighborsred: 20%blue 5%
model-basedred: 20%blue 5%
C.5 Generalizability and robustness of algorithms
0 100 200 300 400 500 600 700 800 900 10000
50
100
150
200
250
Number of predicted links
Num
ber
of h
its
common neighborsprobabilistic modelperfect algorithm
C.6 Reconstruction performance by ranking
0 100 200 300 400 500
0
50
100
150
200
250
300
350
400
450
500
nz = 1740
D.1 Problem of link prediction
Procedure is identical to that of the variant link
reconstruction problem.
0 50 100 150 200 250 300 350 400 450 5000
50
100
150
200
250
300
350
400
Number of links predicted
Num
ber
of h
its
common neighbormodel basedperfect algorithm
Econophysics Co-authorship network (N=506, m=519, nL=379)
0 100 200 300 400 500
0
50
100
150
200
250
300
350
400
450
500
nz = 1038
D.2 Factors to affect prediction performance
Problem of generalizability: a) size of the training set, or time span of prediction; b) time-changing growing mechanism
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0
0.05
0.1
0.15
0.2
0.25
0.3
Connection probability (p)
Pro
babi
lity
dens
ity f
unct
ion
f1f0fn
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
1.2
1.4
Connection probability (p)
Sca
led
dist
ribut
ion
func
tion
F11/r (1-F0)Fn
D.3 Effects of training set size
Assume new links to be known, examine the variant
problem above: training data set is not able to capture
underlying distribution faithfully, either size is too small
or growing rule is time dependent.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-0.01
-0.005
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
Connection probability (p)
Pro
babi
lity
desi
ty f
unct
ion
F1FnF0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
2
4
6
8
10
12
14
Connection Probability (p)
Sca
led
dsitr
ibut
ion
func
tion
F1Fn1/r (1-F0)
Conclusions
The problem of network reconstruction is thoroughlystudied. Under more general framework, the problemcan be reformulated as hypothesis testing problem,which gives deeper insights into our understanding ofthe problem, and enable us to relate the reconstructionperformance of various methods to quantities at morefundamental level.
THANK YOUTHANK YOU