Upload
yuto-yamaguchi
View
274
Download
0
Embed Size (px)
Citation preview
SocNL: Bayesian Label Propaga5on with Confidence
Yuto Yamaguchi† Christos Faloutsos‡ Hiroyuki Kitagawa†
†U. of Tsukuba ‡CMU
Node Classifica5on
15/01/29 Yuto Yamaguchi -‐ AAAI2015 2
Find: correct labels of unlabeled nodes
?
?
Our focus – Classifica5on confidence
Example input graph
Our intui5on -‐ A is most probably conserva5ve -‐ B may be conserva5ve
person
è It’s good to have confidence for our predic5on
e.g., A is conserva5ve with confidence score 0.9 B is conserva5ve with confidence score 0.55
Contribu5ons • Novel Algorithm – Simple, fast, and incorporates confidence
• Theore5cal Analysis – Convergence guarantee & speed – Equivalence to LP and Bayesian inference
• Empirical Analysis – Higher accuracy than compe5tors – Three different real network datasets
PROPOSED METHOD
Smoothness assump5on (widely adopted)
Connected nodes are likely to share a label
B
A
Our Idea
B
A
Smoothness assump5on + confidence IF a node has a lot of red/blue neighbors THEN we can confidently say that it is red/blue
Confident
Not confident
More evidence, more confidence à Bayesian inference
Cases to consider
• Case1: without unlabeled neighbors – Easy but unrealis5c
• Case2: with unlabeled neighbors – We want to handle this case
?
? ?
Case1: No unlabeled neighbors
?Prior
knowledge
evidence
+
Result
Detail
DCM (Dirichlet compound mul5nomial) leads to simple result:
∝ fik: probability that i has label k nik: number of i’s neighbors with label k αk: prior
Case2: With unlabeled neighbors
A B
Classifica>on result for A affects B Classifica>on result for B affects A
In this case we need to solve the recursive equa5on:
Aij: entry of adjacency matrix
Detail
Yes, we can solve it
(Please see the paper for detail)
• Simple: We just need to do matrix inversion
• Fast: power itera5on for sparse matrix inversion is fast (PUU is sparse)
• Confidence: this equa5on is from Bayesian inference
THEORETICAL RESULTS
Convergence guarantee & speed
Theorem 1: The itera5ve algorithm of SocNL always converges on arbitrary graphs if use non-‐zero prior values
Theorem 2: SocNL converges faster if use larger prior values
Theorem 3: Time complexity of each itera5on of SocNL is O( K(N+M) )
Equivalence
Theorem 4: SocNL is equivalent to normal LP if uses prior values = 0
Theorem 5: SocNL is equivalent to Bayesian inference over DCM if ignores unlabeled nodes
* DCM: Dirichlet compound mul5nomial
EMPIRICAL RESULTS
Experimental seings ○ Datasets
○ Compe5tors • Label Propaga>on [ICML03] • Myopic: SocNL ignoring unlabeled nodes
Results
Myopic not good L
SocNL shows higher overall accuracy than compe>tors J
Upper is be3er
Myopic not good L LP not good L
Summary
• Proposed SocNL – Simple, fast, and incorporates confidence
• Theore5cal Analysis – Convergence (Theorems 1,2,3) – Equivalence (Theorems 4,5)
• Empirical Analysis – Higher overall accuracy than compe5tors
Upper is be3er