View
45
Download
0
Category
Preview:
DESCRIPTION
Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper. Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20. Outline. Introduction Problem Definitions Computational Model - PowerPoint PPT Presentation
Citation preview
Truth Discovery with Multiple Confliction Information Providers
on the WebXiaoxin Yin, Jiawei Han, Philip S.Yu
Industrial and Government Track short paper
AdvisorAdvisor :: Dr. Koh Jia-LingDr. Koh Jia-LingSpeakerSpeaker :: Che-Wei LiangChe-Wei Liang
DateDate :: 2007.11.202007.11.20
1
Outline
• Introduction• Problem Definitions• Computational Model– Web Site Trustworthiness and Fact Confidence– Iterative Computation
• Empirical Study• Conclusions
2
Introduction
• World-wide web– a necessary part of our lives.– ex: Amazon.com, ShopZilla.com.
• Is the world-wide web always trustable?– There is no guarantee for the correctness of
information on the web.
3
Introduction
• Example 1: Authors of books
incomplete!
incorrect!
4
Introduction
• Ranking web pages– According to authority based on hyperlinks.– Ex: Authority-Hub analysis, PageRank,
more general link-based analysis.
• Does authority or popularity of web sites lead to accuracy of information?
5
Introduction
• Veracity problem– Discover the true fact about each object.
6
Problem Definitions
• Define1: Confidence of facts.– The probability of a fact f being correct,
denote by s(f).
• Define2: Trustworthiness of web sites.– The expected confidence of the facts provided by
a web site w, denote by t(w).
7
Problem Definitions
• Facts may be conflict or supportive to each other.– Ex: “Jennifer Widom”, “J. Widom”
• Concept of implication– imp(f1 → f2): f1’s influence on f2’s confidence.
8
Basic heuristic
• Basic heuristic1. Usually there is only one true fact
for a property of an object.
2. This true fact appears to be the same or similar on different web sites.
9
Basic heuristic (cont.)
• Basic heuristic3. The false facts on different web sites are
less likely to be the same or similar.
4. In a certain domain, a web site that provides mostly true facts for many objects will likely provide true facts for other objects.
10
Web Site Trustworthiness and Fact Confidence
• Trustworthiness t(w)
where F(w) is the set of facts provided by w.
11
Web Site Trustworthiness and Fact Confidence
• more difficult to estimate the confidence of a fact.
12
Web Site Trustworthiness and Fact Confidence
• Simple case– f1 is the only fact about object o1
– assume w1 and w2 are independent.
• Confidence s(f)
W(f) is the set of web sites providing f.13
Web Site Trustworthiness and Fact Confidence
• Trustworthiness score of a web site
• τ(w) is between 0 and +∞, better characterizes how accurate w is.– ex: t(w1) = 0.9, t(w2) = 0.99
t(w2) = 1.1 × t(w1)
τ(w2) = 2 × τ(w1)
14
Web Site Trustworthiness and Fact Confidence
• Confidence score of a fact
– Property:
15
Web Site Trustworthiness and Fact Confidence
• adjusted confidence score of a fact f
16
Web Site Trustworthiness and Fact Confidence
• Compute the confidence of f based on σ*(f) in the same way as computing it based on σ(f).
• Different web sites are independent. add a dampening factor γ, 0 < γ < 1.
incorrect!
17
Web Site Trustworthiness and Fact Confidence
• Negative-confidence problem– a fact f conflicting with some facts provided by
trustworthy web sites. σ*(f) < 0 and s*(f) < 0.
• – If γ . σ*(f) > 0, s(f) is very close to s*(f).– If γ . σ*(f) < 0, s(f) is close to zero but still
positive.
unreasonable!
18
Iterative Computation
• TRUTHFINDER - Iterative method– TruthFinder has little information about the
web sites and the facts.
– Each iteration, improves its knowledge about trustworthiness and confidence.
– Stops when the computation reaches a stable state.
19
Empirical Study
• Compare with VOTING– Which Chooses the fact that is provided by most
web sites.
• Intel PC with a 1.66GHz dual-core processor, 1GB memory, Windows XP Professional.ρ = 0.5 and γ = 0.3.
20
Empirical Study
21
Empirical Study
22
Empirical Study
23
Empirical Study
24
Conclusions
• Introduce and formulate the Veracity problem– resolving conflicting facts from multiple web site.– finding true facts among them.
• Propose TRUTHFINDER– Utilizes Web site trustworthiness and fact confidence to
find trustable web sites and true facts.
• Experiment achieves high accuracy.
25
Recommended