41
Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1 , Dan Roth 1 , Doug Downey 2 , Mike Anderson 3 1 University of Illinois at Urbana-Champaign 2 Northwestern University 3 Rexonomy

Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Embed Size (px)

Citation preview

Page 1: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Page 1

March 2011

Local and Global Algorithms for

Disambiguation to Wikipedia

Lev Ratinov1, Dan Roth1, Doug Downey2, Mike Anderson3

1University of Illinois at Urbana-Champaign2Northwestern University3Rexonomy

Page 2: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Information overload

2

Page 3: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Organizing knowledge

3

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..

Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.

Page 4: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Cross-document co-reference resolution

4

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..

Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.

Page 5: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Reference resolution: (disambiguation to Wikipedia)

5

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..

Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.

Page 6: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

The “reference” collection has structure

6

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..

Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.

Used_In

Is_aIs_a

Succeeded

Released

Page 7: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Analysis of Information Networks

7

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..

Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.

Page 8: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Here – Wikipedia as a knowledge resource …. but we can use other resources

8

Used_In

Is_aIs_a

Succeeded

Released

Page 9: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Talk outline

High-level algorithmic approach. Bi-partite graph matching with global and local inference.

Local Inference. Experiments & Results

Global Inference. Experiments & Results

Results, Conclusions

Demo

9

Page 10: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Problem formulation - matching/ranking problem

10

Text Document(s)—News, Blogs,…

Wikipedia Articles

Page 11: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Local approach

11

Γ is a solution to the problem A set of pairs (m,t)

m: a mention in the document t: the matched Wikipedia Title

Text Document(s)—News, Blogs,…

Wikipedia Articles

Page 12: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Local approach

12

Γ is a solution to the problem A set of pairs (m,t)

m: a mention in the document t: the matched Wikipedia Title

Local score of matchingthe mention to the title

Text Document(s)—News, Blogs,…

Wikipedia Articles

Page 13: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Local + Global : using the Wikipedia structure

13

A “global” term – evaluating how good the structure of

the solution is

Text Document(s)—News, Blogs,…

Wikipedia Articles

Page 14: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Can be reduced to an NP-hard problem

14

Text Document(s)—News, Blogs,…

Wikipedia Articles

Page 15: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

A tractable variation

15

1. Invent a surrogate solution Γ’; • disambiguate each mention

independently.2. Evaluate the structure based on pair-

wise coherence scores Ψ(ti,tj)

Text Document(s)—News, Blogs,…

Wikipedia Articles

Page 16: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Talk outline

High-level algorithmic approach. Bi-partite graph matching with global and local inference.

Local Inference. Experiments & Results

Global Inference. Experiments & Results

Results, Conclusions

Demo

16

Page 17: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

I. Baseline : P(Title|Surface Form)

17

P(Title|”Chicago”)

Page 18: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

II. Context(Title)

18

Context(Charcoal)+=“a font called __ is used to”

Page 19: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

III. Text(Title)

19

Just the text of the page (one per title)

Page 20: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Putting it all together

City Vs Font: (0.99-0.0001, 0.01-0.2, 0.03-0.01) Band Vs Font: (0.001-0.0001, 0.001-0.2, 0.02-0.01) Training ranking SVM:

Consider all title pairs. Train a ranker on the pairs (learn to prefer the correct solution). Inference = knockout tournament. Key: Abstracts over the text – learns which scores are important.

20

ScoreBaseline

ScoreContext

ScoreText

Chicago_city 0.99 0.01 0.03

Chicago_font 0.0001 0.2 0.01

Chicago_band 0.001 0.001 0.02

Page 21: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Example: font or city?

21

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Text(Chicago_city), Context(Chicago_city)

Text(Chicago_font), Context(Chicago_font)

Page 22: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Lexical matching

22

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Text(Chicago_city), Context(Chicago_city)

Text(Chicago_font), Context(Chicago_font)

Cosine similarity,TF-IDF weighting

Page 23: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Ranking – font vs. city

23

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Text(Chicago_city), Context(Chicago_city)

Text(Chicago_font), Context(Chicago_font)

0.5 0.2 0.1 0.8

0.3 0.2 0.3 0.5

Page 24: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Train a ranking SVM

24

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Text(Chicago_city), Context(Chicago_city)

Text(Chicago_font), Context(Chicago_font)

(0.5, 0.2 , 0.1, 0.8)

(0.3, 0.2, 0.3, 0.5)

[(0.2, 0, -0.2, 0.3), -1]

Page 25: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Scaling issues – one of our key contributions

25

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Text(Chicago_city), Context(Chicago_city)

Text(Chicago_font), Context(Chicago_font)

Page 26: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Scaling issues

26

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Text(Chicago_city), Context(Chicago_city)

Text(Chicago_font), Context(Chicago_font)

This stuff is big, and is loaded into the memory

from the disk

Page 27: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Improving performance

27

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Text(Chicago_city), Context(Chicago_city)

Text(Chicago_font), Context(Chicago_font)

Rather than computing TF-IDF weighted cosine

similarity, we want to train a classifier on the fly. But

due to the aggressive feature pruning, we

choose PrTFIDF

Page 28: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Performance (local only): ranking accuracy

28

Dataset Baseline(solvable)

+Local TFIDF(solvable)

+Local PrTFIDF(solvable)

ACE 94.05 95.67 96.21

MSN News 81.91 84.04 85.10

AQUAINT 93.19 94.38 95.57

Wikipedia Test 85.88 92.76 93.59

Page 29: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Talk outline

High-level algorithmic approach. Bi-partite graph matching with global and local inference.

Local Inference. Experiments & Results

Global Inference. Experiments & Results

Results, Conclusions

Demo

29

Page 30: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Co-occurrence(Title1,Title2)

30

The city senses of Boston and Chicago

appear together often.

Page 31: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Co-occurrence(Title1,Title2)

31

Rock music and albums appear together often

Page 32: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Global ranking

How to approximate the “global semantic context” in the document”? (What is Γ’?) Use only non-ambiguous mentions for Γ’ Use the top baseline disambiguation for NER surface forms. Use the top baseline disambiguation for all the surface forms.

How to define relatedness between two titles? (What is Ψ?)

32

Page 33: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Ψ : Pair-wise relatedness between 2 titles:

Normalized Google Distance

Pointwise Mutual Information

33

Page 34: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

What is best the Γ’? (ranker accuracy, solvable mentions)

34

Dataset Baseline Baseline+Lexical

Baseline+GlobalUnambiguous

Baseline+GlobalNER

Baseline+Global, AllMentions

ACE 94.05 94.56 96.21 96.75

MSN News 81.91 84.46 84.04 88.51

AQUAINT 93.19 95.40 94.04 95.91

Wikipedia Test 85.88 89.67 89.59 89.79

Page 35: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Results – ranker accuracy (solvable mentions)

35

Dataset Baseline Baseline+Lexical

Baseline+GlobalUnambiguous

Baseline+GlobalNER

Baseline+Global, AllMentions

ACE 94.05 96.21 96.75

MSN News 81.91 85.10 88.51

AQUAINT 93.19 95.57 95.91

Wikipedia Test 85.88 93.59 89.79

Page 36: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Results: Local + Global

36

Dataset Baseline Baseline+Lexical

Baseline+Lexical+Global

ACE 94.05 96.21 97.83

MSN News 81.91 85.10 87.02

AQUAINT 93.19 95.57 94.38

Wikipedia Test 85.88 93.59 94.18

Page 37: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Talk outline

High-level algorithmic approach. Bi-partite graph matching with global and local inference.

Local Inference. Experiments & Results

Global Inference. Experiments & Results

Results, Conclusions

Demo

37

Page 38: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Conclusions:

Dealing with a very large scale knowledge acquisition and extraction problem

State-of-the-art algorithmic tools that exploit using content & structure of the network.

Formulated a framework for Local & Global reference resolution and disambiguation into knowledge networks

Proposed local and global algorithms: state of the art performance. Addressed scaling issue: a major issue. Identified key remaining challenges (next slide).

38

Page 39: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

We want to know what we don’t know

Not dealt well in the literature “As Peter Thompson, a 16-year-old hunter, said ..” “Dorothy Byrne, a state coordinator for the Florida Green Party…”

We train a separate SVM classifier to identify such cases. The features are: All the baseline, lexical and semantic scores of the top candidate. Score assigned to the top candidate by the ranker. The “confidence” of the ranker on the top candidate with respect to

second-best disambiguation. Good-Turing probability of out-of-Wikipedia occurrence for the

mention. Limited success; future research.

39

Page 40: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Comparison to the previous state of the art (all mentions, including OOW)

40

Dataset Baseline Milne&Witten Our System-GLOW

ACE 69.52 72.76 77.25

MSN News 72.83 68.49 74.88

AQUAINT 82.64 83.61 83.94

Wikipedia Test 81.77 80.32 90.54

Page 41: Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of

Demo

41