Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Page 1

March 2011

Local and Global Algorithms for

Disambiguation to Wikipedia

Lev Ratinov1, Dan Roth1, Doug Downey2, Mike Anderson3

1University of Illinois at Urbana-Champaign2Northwestern University3Rexonomy

Information overload

2

Organizing knowledge

3

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..

Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.

Cross-document co-reference resolution

4




Reference resolution: (disambiguation to Wikipedia)

5




The “reference” collection has structure

6




Used_In

Is_aIs_a

Succeeded

Released

Analysis of Information Networks

7




Here – Wikipedia as a knowledge resource …. but we can use other resources

8

Used_In

Is_aIs_a

Succeeded

Released

Talk outline

High-level algorithmic approach. Bi-partite graph matching with global and local inference.

Local Inference. Experiments & Results

Global Inference. Experiments & Results

Results, Conclusions

Demo

9

Problem formulation - matching/ranking problem

10

Text Document(s)—News, Blogs,…

Wikipedia Articles

Local approach

11

Γ is a solution to the problem A set of pairs (m,t)

m: a mention in the document t: the matched Wikipedia Title


Wikipedia Articles

Local approach

12

Γ is a solution to the problem A set of pairs (m,t)

m: a mention in the document t: the matched Wikipedia Title

Local score of matchingthe mention to the title


Wikipedia Articles

Local + Global : using the Wikipedia structure

13

A “global” term – evaluating how good the structure of

the solution is


Wikipedia Articles

Can be reduced to an NP-hard problem

14


Wikipedia Articles

A tractable variation

15

1. Invent a surrogate solution Γ’; • disambiguate each mention

independently.2. Evaluate the structure based on pair-

wise coherence scores Ψ(ti,tj)


Wikipedia Articles

Talk outline





Demo

16

I. Baseline : P(Title|Surface Form)

17

P(Title|”Chicago”)

II. Context(Title)

18

Context(Charcoal)+=“a font called __ is used to”

III. Text(Title)

19

Just the text of the page (one per title)

Putting it all together

City Vs Font: (0.99-0.0001, 0.01-0.2, 0.03-0.01) Band Vs Font: (0.001-0.0001, 0.001-0.2, 0.02-0.01) Training ranking SVM:

Consider all title pairs. Train a ranker on the pairs (learn to prefer the correct solution). Inference = knockout tournament. Key: Abstracts over the text – learns which scores are important.

20

ScoreBaseline

ScoreContext

ScoreText

Chicago_city 0.99 0.01 0.03

Chicago_font 0.0001 0.2 0.01

Chicago_band 0.001 0.001 0.02

Example: font or city?

21


Text(Chicago_city), Context(Chicago_city)

Text(Chicago_font), Context(Chicago_font)

Lexical matching

22




Cosine similarity,TF-IDF weighting

Ranking – font vs. city

23




0.5 0.2 0.1 0.8

0.3 0.2 0.3 0.5

Train a ranking SVM

24




(0.5, 0.2 , 0.1, 0.8)

(0.3, 0.2, 0.3, 0.5)

[(0.2, 0, -0.2, 0.3), -1]

Scaling issues – one of our key contributions

25




Scaling issues

26




This stuff is big, and is loaded into the memory

from the disk

Improving performance

27




Rather than computing TF-IDF weighted cosine

similarity, we want to train a classifier on the fly. But

due to the aggressive feature pruning, we

choose PrTFIDF

Performance (local only): ranking accuracy

28

Dataset Baseline(solvable)

+Local TFIDF(solvable)

+Local PrTFIDF(solvable)

ACE 94.05 95.67 96.21

MSN News 81.91 84.04 85.10

AQUAINT 93.19 94.38 95.57

Wikipedia Test 85.88 92.76 93.59

Talk outline





Demo

29

Co-occurrence(Title1,Title2)

30

The city senses of Boston and Chicago

appear together often.

Co-occurrence(Title1,Title2)

31

Rock music and albums appear together often

Global ranking

How to approximate the “global semantic context” in the document”? (What is Γ’?) Use only non-ambiguous mentions for Γ’ Use the top baseline disambiguation for NER surface forms. Use the top baseline disambiguation for all the surface forms.

How to define relatedness between two titles? (What is Ψ?)

32

Ψ : Pair-wise relatedness between 2 titles:

Normalized Google Distance

Pointwise Mutual Information

33

What is best the Γ’? (ranker accuracy, solvable mentions)

34

Dataset Baseline Baseline+Lexical

Baseline+GlobalUnambiguous

Baseline+GlobalNER

Baseline+Global, AllMentions

ACE 94.05 94.56 96.21 96.75

MSN News 81.91 84.46 84.04 88.51

AQUAINT 93.19 95.40 94.04 95.91

Wikipedia Test 85.88 89.67 89.59 89.79

Results – ranker accuracy (solvable mentions)

35


Baseline+GlobalUnambiguous

Baseline+GlobalNER

Baseline+Global, AllMentions

ACE 94.05 96.21 96.75

MSN News 81.91 85.10 88.51

AQUAINT 93.19 95.57 95.91


Results: Local + Global

36


Baseline+Lexical+Global

ACE 94.05 96.21 97.83

MSN News 81.91 85.10 87.02

AQUAINT 93.19 95.57 94.38


Talk outline





Demo

37

Conclusions:

Dealing with a very large scale knowledge acquisition and extraction problem

State-of-the-art algorithmic tools that exploit using content & structure of the network.

Formulated a framework for Local & Global reference resolution and disambiguation into knowledge networks

Proposed local and global algorithms: state of the art performance. Addressed scaling issue: a major issue. Identified key remaining challenges (next slide).

38

We want to know what we don’t know

Not dealt well in the literature “As Peter Thompson, a 16-year-old hunter, said ..” “Dorothy Byrne, a state coordinator for the Florida Green Party…”

We train a separate SVM classifier to identify such cases. The features are: All the baseline, lexical and semantic scores of the top candidate. Score assigned to the top candidate by the ranker. The “confidence” of the ranker on the top candidate with respect to

second-best disambiguation. Good-Turing probability of out-of-Wikipedia occurrence for the

mention. Limited success; future research.

39

Comparison to the previous state of the art (all mentions, including OOW)

40

Dataset Baseline Milne&Witten Our System-GLOW

ACE 69.52 72.76 77.25

MSN News 72.83 68.49 74.88

AQUAINT 82.64 83.61 83.94


Demo

41

Documents

Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of