Link Reconstruction from Partial Information Gong Xiaofeng, Li Kun & C. H. Lai TSL@NUS

Link Reconstruction from Partial Information

Gong Xiaofeng, Li Kun & C. H. LaiTSL@NUS

General situations where problems may arise

Observed network (ANxN filled with 0s and 1s) Scenarios:A) no side information. statistical analysis, clustering, modeling, process, etc.B) Some links are uncertain (positions known) link reconstruction problem, based on model, similarity

measure.C) Some 1s are set to be 0s (positions unknown) variant problem of link reconstruction, possible related to

link prediction.D) network is subject to change. one kind of prediction problem (link prediction), node

prediction, network evolution, etc.

B.1 Problem of network reconstruction

Guess out the values (0 or 1) of dashed arrows.

There are some unknown links, which may be corrupted, missed or unable to measure at time.

Presumptions: o Network has structures.o Unknown links are fairly sampled.oNumber of unknown links are small.

B.2 Procedures of reconstruction of links

Available information -> fitted probabilistic model P(NxN)-> connection probability p(i,j) of each unknown links (i,j)-> determine a threshold of connection probability Pt-> set (i,j) to be 1, if p(i,j)>pt, and 0 otherwise

observed network

parameters

model function

optimizationconnection probability

threshold reconstruction or prediction

modelingprediction

B.3 Reformulated signal detection problem

Observed network -> 3 types of signals, 0, 1 and ?.Fitted model -> connection probabilities, P0 and P1.Signals (P?) to be classified -> ?

Problem: Giving connection probability P? -> type of signal (0 or 1)

Assumption under certain model:Unknown links do not influence significantly the reliability of fitted model (P0 and P1) , i.e., Connection probability P? of any unknown link can be regarded as be sampled from P0 or P1.

Searching an optimal detection scheme? e.g., Neyman-Pearson criterion,

Observation (data): connection probability (p)Hypothesis: H0: 0-link and H1: 1-link Data space E: R0 and R1, acceptance region

Decision D: D0 (accept H0) and D1 (accept H1)

B.4 An equivalent hypothesis testing problem

1010 RRERR

HDPPHDPP

fyD PHDPHDP )(),(min 0110)(

B.5 Measuring reconstruction performance

actual valuepre

dictin

p np’ True Positive (TP) False Positive (FP) P’n’ False Negative (FN) True Negative (TN) N’

Contingency table (or confusion matrix)

statistics defined: Sensitivity or True Positive Rate

(TPR):

TPR=TP/P=TP/(TP+FN)

False Positive Rate (FPR): FPR=FP/N=FP/(FP+TN)Accuracy (ACC): ACC=(TP+TN)/(P+N)

True Negative Rate or Specificity (SPC)

:SPC=TN/N=1-FPR

Positive Predictive Value (PPV): PPV=TP/(TP+FP)Receiver Operating Characteristic

(ROC):

TPR vs. FPR

B.6 Relation to performance measures

connection probabilities

B.7 Criterion of MAP

pHPpHPD

pHPpHPDD

For reconstruction problem, we choose criterion to maximize the a posteriori probability of the two hypothesis.

i cLHcP

HcPcHP

A.1 Probabilistic model of structured networks

CMAwwfC

wwCCnkwCC

wwwwijij

ijijjiij

jiijkk

Tmkkkk

matrix adjancency ,matrix connection

attribute define node for

,)1(Pr

),(),2,1,(

],,,[,

A.2 Estimate model parameters (MLE)

met are conditions stopping wheniterating cease )3

updateusly simultaneo )2

initial fromstart )1

onoptimizati basedgradient iterated

)1ln()1(ln)Pr(ln

pApAwAL

j kjjk

ijiijij

B.8 Example network

B.9 Density function of connection

probabilities

0 0.2 0.4 0.6 0.8 1

Connection Probability (p)

1/r f0(p)

B.10 MAP detector minimizes average error

101010

)(0)()(

)())(1(min

)(1)()(

pfpfpf

dppfHDP

dppfdppfHDP

HDPHDPM

Density function is usually jagged and difficult to work with. Distribution function is preferred. Consider the minimum average error (cost).

B.11 Distribution of connection

probabilities

0 0.2 0.4 0.6 0.8 1-0.5

Connection probability (p)

1/r (1-F0(p))

F1(p)+1/r (1-F0(p))

B.12 Generalizability of algorithm

0 0.2 0.4 0.6 0.8 110

F0m(p)

0 0.2 0.4 0.6 0.8 10

F1m(p)

Unknowns following same distribution approximately?

Possible reasons for unfavorable burst at tail, source of model error.

B.13 Robustness of algorithm

0 0.2 0.4 0.6 0.8 10

1/r (1-F0(p)) 5%

1/r (1-F0m(p)) 5%

1/r (1-F0(p)) 10%

1/r (1-F0m(p)) 10%

1/r (1-F0(p)) 15%

1/r (1-F0m(p)) 15%

1/r (1-F0(p)) 20%

1/r (1-F0m(p)) 20%

0 0.2 0.4 0.6 0.8 10

F1(p) 5%

F1m(p) 5%

F1(p) 10%

F1m(p) 10%

F1(p) 15%

F1m(p) 15%

F1(p) 20%

F1m(p) 20%

sensitive to number of unknown links?

B.14 Comparison of operation points

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1/r (1-F0(p))

F1m(p)

1/r (1-F1m(p)

F1(p)+1/r (1-F0(p))

F1m(p)+1/r (1-F0

B.15 Reconstruction results

P N ACC (%) TP/P (%)TN/N (%)

TP/(TP+FP) (%)

201 5293 98.13 80.60 98.79 71.68

222 5272 98.13 80.63 98.86 74.90

192 5302 98.11 75.52 98.92 71.78

224 5270 98.25 80.80 98.99 77.35

235 5259 98.13 75.32 99.14 79.73

217 5277 98.38 78.34 99.20 80.19

204 5290 98.31 77.45 99.11 77.07

192 5302 98.25 71.88 99.21 76.67

231 5263 98.16 77.06 99.09 78.76

217 5277 97.93 71.89 99.00 74.64

213.5 5280.5 98.18 76.95 99.03 76.28

USAir Network, 10% missed

C.1 A variant problem of link reconstruction

Observed network -> types of signals, 0 and 1.

some 0s are originally 1s, but be set as 0s. position unknown, number known or unknown.

C.2 Procedures for the variant problem

Available information -> fitted probabilistic model P(NxN)-> connection probability p(i,j) of each 0-link (i,j)-> (a) number (M) unknown -> determine a threshold of connection probability Pt -> set (i,j) to be 1, if p(i,j)>pt, and 0 otherwise (b) number (M) known -> scoring: ranking connection probabilities of candidate links (all 0-links) -> set M links with highest score to be 1s.

C.3 Algorithm based on common neighbor

max/ nnpp ijjiij

0.1 0.2 0.3 0.4 0.5 0.6 0.7

ibution f

unctions

F11/r (1-F0)F1+1/r (1-F0)

0.1 0.2 0.3 0.4 0.5 0.6 0.7

Connection probability (p)S

ibution f

unction

F11/r (1-F0)F1 + 1/r (1-F0)

C.4 Comparison between two methods

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

f1 common neighbors1/r f0 common neighborsf1 model1/r f0 model

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

F1 common neighbors1/r (1-F0) common neighborsF1 model1/r (1-F0) model

Probability density functions Distribution functions

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

common neighborsred: 20%blue 5%

model-basedred: 20%blue 5%

C.5 Generalizability and robustness of algorithms

0 100 200 300 400 500 600 700 800 900 10000

Number of predicted links

common neighborsprobabilistic modelperfect algorithm

C.6 Reconstruction performance by ranking

0 100 200 300 400 500

nz = 1740

D.1 Problem of link prediction

Procedure is identical to that of the variant link

reconstruction problem.

0 50 100 150 200 250 300 350 400 450 5000

Number of links predicted

common neighbormodel basedperfect algorithm

Econophysics Co-authorship network (N=506, m=519, nL=379)

0 100 200 300 400 500

nz = 1038

D.2 Factors to affect prediction performance

Problem of generalizability: a) size of the training set, or time span of prediction; b) time-changing growing mechanism

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

f1f0fn

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

F11/r (1-F0)Fn

D.3 Effects of training set size

Assume new links to be known, examine the variant

problem above: training data set is not able to capture

underlying distribution faithfully, either size is too small

or growing rule is time dependent.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-0.005

F1FnF0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

F1Fn1/r (1-F0)

Conclusions

The problem of network reconstruction is thoroughlystudied. Under more general framework, the problemcan be reformulated as hypothesis testing problem,which gives deeper insights into our understanding ofthe problem, and enable us to relate the reconstructionperformance of various methods to quantities at morefundamental level.

THANK YOUTHANK YOU

Link Reconstruction from Partial Information Gong Xiaofeng, Li Kun & C. H. Lai TSL@NUS

Documents

Kollektion - Kobold Markisen6202/112 TSL 24 6203/155 TSL 24 6227/179 TSL 24 6228/112 TSL 24 6228/179 TSL 24 6229/197 TSL 24 6230/102 TSL 24 6230/103 TSL 25 6230/105 TSL 25 …

Bench Top Tunable Lasers TSL-210/220 · TSL-220 TSL-210 ote: A t rack ing f l e is available as an option for the TSL-210. Santec's TSL models are designed as fully-controllable,

TSL portfolio2

TSL 3106 notes

IMPORTANT SAFEGUARDS READ AND FOLLOW ALL SAFETY … · Linkable accessory TSL-L4 is designed for joining TSL-4 to TSL-4. Linkable accessory TSL-L8 is designed for joining TSL-8 to

Module (TSL 3073)

Tsl digital

SSL/TSL Protocols

tsl 3108_TOPIC_8 & 9

Mulan Xiaofeng Wang Doctor of Philosophy at Carnegie ...€¦ · SupplyChainManagementandEconomic ValuationofRealOptionsintheNaturalGas andLiqueﬂedNaturalGasIndustry Mulan Xiaofeng

JV425-01-TSL STR L Culvert TSL Letter D00 20150701

Управление заказами на основе каталога ...TSL NA Austin, TX, Beaverton, OR TSL China Beijing, China TSL Satellite Japan Tokyo, Japan TSL Satellite

Tsl Zarathustra

Tutorial Tsl 3114

Coursework Tsl 3104

TSL Lighting

Laporan Akhir TSL

Module Tsl 3108

Tsl 1014 - Consonant

TSL Construction Methods