Upload
clyde-daniels
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Page 1
March 2011
Local and Global Algorithms for
Disambiguation to Wikipedia
Lev Ratinov1, Dan Roth1, Doug Downey2, Mike Anderson3
1University of Illinois at Urbana-Champaign2Northwestern University3Rexonomy
Information overload
2
Organizing knowledge
3
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.
Cross-document co-reference resolution
4
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.
Reference resolution: (disambiguation to Wikipedia)
5
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.
The “reference” collection has structure
6
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.
Used_In
Is_aIs_a
Succeeded
Released
Analysis of Information Networks
7
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.
Here – Wikipedia as a knowledge resource …. but we can use other resources
8
Used_In
Is_aIs_a
Succeeded
Released
Talk outline
High-level algorithmic approach. Bi-partite graph matching with global and local inference.
Local Inference. Experiments & Results
Global Inference. Experiments & Results
Results, Conclusions
Demo
9
Problem formulation - matching/ranking problem
10
Text Document(s)—News, Blogs,…
Wikipedia Articles
Local approach
11
Γ is a solution to the problem A set of pairs (m,t)
m: a mention in the document t: the matched Wikipedia Title
Text Document(s)—News, Blogs,…
Wikipedia Articles
Local approach
12
Γ is a solution to the problem A set of pairs (m,t)
m: a mention in the document t: the matched Wikipedia Title
Local score of matchingthe mention to the title
Text Document(s)—News, Blogs,…
Wikipedia Articles
Local + Global : using the Wikipedia structure
13
A “global” term – evaluating how good the structure of
the solution is
Text Document(s)—News, Blogs,…
Wikipedia Articles
Can be reduced to an NP-hard problem
14
Text Document(s)—News, Blogs,…
Wikipedia Articles
A tractable variation
15
1. Invent a surrogate solution Γ’; • disambiguate each mention
independently.2. Evaluate the structure based on pair-
wise coherence scores Ψ(ti,tj)
Text Document(s)—News, Blogs,…
Wikipedia Articles
Talk outline
High-level algorithmic approach. Bi-partite graph matching with global and local inference.
Local Inference. Experiments & Results
Global Inference. Experiments & Results
Results, Conclusions
Demo
16
I. Baseline : P(Title|Surface Form)
17
P(Title|”Chicago”)
II. Context(Title)
18
Context(Charcoal)+=“a font called __ is used to”
III. Text(Title)
19
Just the text of the page (one per title)
Putting it all together
City Vs Font: (0.99-0.0001, 0.01-0.2, 0.03-0.01) Band Vs Font: (0.001-0.0001, 0.001-0.2, 0.02-0.01) Training ranking SVM:
Consider all title pairs. Train a ranker on the pairs (learn to prefer the correct solution). Inference = knockout tournament. Key: Abstracts over the text – learns which scores are important.
20
ScoreBaseline
ScoreContext
ScoreText
Chicago_city 0.99 0.01 0.03
Chicago_font 0.0001 0.2 0.01
Chicago_band 0.001 0.001 0.02
Example: font or city?
21
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Text(Chicago_city), Context(Chicago_city)
Text(Chicago_font), Context(Chicago_font)
Lexical matching
22
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Text(Chicago_city), Context(Chicago_city)
Text(Chicago_font), Context(Chicago_font)
Cosine similarity,TF-IDF weighting
Ranking – font vs. city
23
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Text(Chicago_city), Context(Chicago_city)
Text(Chicago_font), Context(Chicago_font)
0.5 0.2 0.1 0.8
0.3 0.2 0.3 0.5
Train a ranking SVM
24
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Text(Chicago_city), Context(Chicago_city)
Text(Chicago_font), Context(Chicago_font)
(0.5, 0.2 , 0.1, 0.8)
(0.3, 0.2, 0.3, 0.5)
[(0.2, 0, -0.2, 0.3), -1]
Scaling issues – one of our key contributions
25
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Text(Chicago_city), Context(Chicago_city)
Text(Chicago_font), Context(Chicago_font)
Scaling issues
26
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Text(Chicago_city), Context(Chicago_city)
Text(Chicago_font), Context(Chicago_font)
This stuff is big, and is loaded into the memory
from the disk
Improving performance
27
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Text(Chicago_city), Context(Chicago_city)
Text(Chicago_font), Context(Chicago_font)
Rather than computing TF-IDF weighted cosine
similarity, we want to train a classifier on the fly. But
due to the aggressive feature pruning, we
choose PrTFIDF
Performance (local only): ranking accuracy
28
Dataset Baseline(solvable)
+Local TFIDF(solvable)
+Local PrTFIDF(solvable)
ACE 94.05 95.67 96.21
MSN News 81.91 84.04 85.10
AQUAINT 93.19 94.38 95.57
Wikipedia Test 85.88 92.76 93.59
Talk outline
High-level algorithmic approach. Bi-partite graph matching with global and local inference.
Local Inference. Experiments & Results
Global Inference. Experiments & Results
Results, Conclusions
Demo
29
Co-occurrence(Title1,Title2)
30
The city senses of Boston and Chicago
appear together often.
Co-occurrence(Title1,Title2)
31
Rock music and albums appear together often
Global ranking
How to approximate the “global semantic context” in the document”? (What is Γ’?) Use only non-ambiguous mentions for Γ’ Use the top baseline disambiguation for NER surface forms. Use the top baseline disambiguation for all the surface forms.
How to define relatedness between two titles? (What is Ψ?)
32
Ψ : Pair-wise relatedness between 2 titles:
Normalized Google Distance
Pointwise Mutual Information
33
What is best the Γ’? (ranker accuracy, solvable mentions)
34
Dataset Baseline Baseline+Lexical
Baseline+GlobalUnambiguous
Baseline+GlobalNER
Baseline+Global, AllMentions
ACE 94.05 94.56 96.21 96.75
MSN News 81.91 84.46 84.04 88.51
AQUAINT 93.19 95.40 94.04 95.91
Wikipedia Test 85.88 89.67 89.59 89.79
Results – ranker accuracy (solvable mentions)
35
Dataset Baseline Baseline+Lexical
Baseline+GlobalUnambiguous
Baseline+GlobalNER
Baseline+Global, AllMentions
ACE 94.05 96.21 96.75
MSN News 81.91 85.10 88.51
AQUAINT 93.19 95.57 95.91
Wikipedia Test 85.88 93.59 89.79
Results: Local + Global
36
Dataset Baseline Baseline+Lexical
Baseline+Lexical+Global
ACE 94.05 96.21 97.83
MSN News 81.91 85.10 87.02
AQUAINT 93.19 95.57 94.38
Wikipedia Test 85.88 93.59 94.18
Talk outline
High-level algorithmic approach. Bi-partite graph matching with global and local inference.
Local Inference. Experiments & Results
Global Inference. Experiments & Results
Results, Conclusions
Demo
37
Conclusions:
Dealing with a very large scale knowledge acquisition and extraction problem
State-of-the-art algorithmic tools that exploit using content & structure of the network.
Formulated a framework for Local & Global reference resolution and disambiguation into knowledge networks
Proposed local and global algorithms: state of the art performance. Addressed scaling issue: a major issue. Identified key remaining challenges (next slide).
38
We want to know what we don’t know
Not dealt well in the literature “As Peter Thompson, a 16-year-old hunter, said ..” “Dorothy Byrne, a state coordinator for the Florida Green Party…”
We train a separate SVM classifier to identify such cases. The features are: All the baseline, lexical and semantic scores of the top candidate. Score assigned to the top candidate by the ranker. The “confidence” of the ranker on the top candidate with respect to
second-best disambiguation. Good-Turing probability of out-of-Wikipedia occurrence for the
mention. Limited success; future research.
39
Comparison to the previous state of the art (all mentions, including OOW)
40
Dataset Baseline Milne&Witten Our System-GLOW
ACE 69.52 72.76 77.25
MSN News 72.83 68.49 74.88
AQUAINT 82.64 83.61 83.94
Wikipedia Test 81.77 80.32 90.54
Demo
41