Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan...

March 2011

Local and Global Algorithms for

Disambiguation to Wikipedia

Lev Ratinov1, Dan Roth1, Doug Downey2, Mike Anderson3

1University of Illinois at Urbana-Champaign2Northwestern University3Rexonomy

Information overload

Organizing knowledge

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..

Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.

Cross-document co-reference resolution

Reference resolution: (disambiguation to Wikipedia)

The “reference” collection has structure

Used_In

Is_aIs_a

Succeeded

Released

Analysis of Information Networks

Here – Wikipedia as a knowledge resource …. but we can use other resources

Used_In

Is_aIs_a

Succeeded

Released

Talk outline

High-level algorithmic approach. Bi-partite graph matching with global and local inference.

Local Inference. Experiments & Results

Global Inference. Experiments & Results

Results, Conclusions

Problem formulation - matching/ranking problem

Text Document(s)—News, Blogs,…

Wikipedia Articles

Local approach

Γ is a solution to the problem A set of pairs (m,t)

m: a mention in the document t: the matched Wikipedia Title

Wikipedia Articles

Local approach

Γ is a solution to the problem A set of pairs (m,t)

m: a mention in the document t: the matched Wikipedia Title

Local score of matchingthe mention to the title

Wikipedia Articles

Local + Global : using the Wikipedia structure

A “global” term – evaluating how good the structure of

the solution is

Wikipedia Articles

Can be reduced to an NP-hard problem

Wikipedia Articles

A tractable variation

1. Invent a surrogate solution Γ’; • disambiguate each mention

independently.2. Evaluate the structure based on pair-

wise coherence scores Ψ(ti,tj)

Wikipedia Articles

Talk outline

I. Baseline : P(Title|Surface Form)

P(Title|”Chicago”)

II. Context(Title)

Context(Charcoal)+=“a font called __ is used to”

III. Text(Title)

Just the text of the page (one per title)

Putting it all together

City Vs Font: (0.99-0.0001, 0.01-0.2, 0.03-0.01) Band Vs Font: (0.001-0.0001, 0.001-0.2, 0.02-0.01) Training ranking SVM:

Consider all title pairs. Train a ranker on the pairs (learn to prefer the correct solution). Inference = knockout tournament. Key: Abstracts over the text – learns which scores are important.

ScoreBaseline

ScoreContext

ScoreText

Chicago_city 0.99 0.01 0.03

Chicago_font 0.0001 0.2 0.01

Chicago_band 0.001 0.001 0.02

Example: font or city?

Text(Chicago_city), Context(Chicago_city)

Text(Chicago_font), Context(Chicago_font)

Lexical matching

Cosine similarity,TF-IDF weighting

Ranking – font vs. city

0.5 0.2 0.1 0.8

0.3 0.2 0.3 0.5

Train a ranking SVM

(0.5, 0.2 , 0.1, 0.8)

(0.3, 0.2, 0.3, 0.5)

[(0.2, 0, -0.2, 0.3), -1]

Scaling issues – one of our key contributions

Scaling issues

This stuff is big, and is loaded into the memory

from the disk

Improving performance

Rather than computing TF-IDF weighted cosine

similarity, we want to train a classifier on the fly. But

due to the aggressive feature pruning, we

choose PrTFIDF

Performance (local only): ranking accuracy

Dataset Baseline(solvable)

+Local TFIDF(solvable)

+Local PrTFIDF(solvable)

ACE 94.05 95.67 96.21

MSN News 81.91 84.04 85.10

AQUAINT 93.19 94.38 95.57

Wikipedia Test 85.88 92.76 93.59

Talk outline

Co-occurrence(Title1,Title2)

The city senses of Boston and Chicago

appear together often.

Co-occurrence(Title1,Title2)

Rock music and albums appear together often

Global ranking

How to approximate the “global semantic context” in the document”? (What is Γ’?) Use only non-ambiguous mentions for Γ’ Use the top baseline disambiguation for NER surface forms. Use the top baseline disambiguation for all the surface forms.

How to define relatedness between two titles? (What is Ψ?)

Ψ : Pair-wise relatedness between 2 titles:

Normalized Google Distance

Pointwise Mutual Information

What is best the Γ’? (ranker accuracy, solvable mentions)

Dataset Baseline Baseline+Lexical

Baseline+GlobalUnambiguous

Baseline+GlobalNER

Baseline+Global, AllMentions

ACE 94.05 94.56 96.21 96.75

MSN News 81.91 84.46 84.04 88.51

AQUAINT 93.19 95.40 94.04 95.91

Wikipedia Test 85.88 89.67 89.59 89.79

Results – ranker accuracy (solvable mentions)

Baseline+GlobalUnambiguous

Baseline+GlobalNER

Baseline+Global, AllMentions

ACE 94.05 96.21 96.75

MSN News 81.91 85.10 88.51

AQUAINT 93.19 95.57 95.91

Results: Local + Global

Baseline+Lexical+Global

ACE 94.05 96.21 97.83

MSN News 81.91 85.10 87.02

AQUAINT 93.19 95.57 94.38

Talk outline

Conclusions:

Dealing with a very large scale knowledge acquisition and extraction problem

State-of-the-art algorithmic tools that exploit using content & structure of the network.

Formulated a framework for Local & Global reference resolution and disambiguation into knowledge networks

Proposed local and global algorithms: state of the art performance. Addressed scaling issue: a major issue. Identified key remaining challenges (next slide).

We want to know what we don’t know

Not dealt well in the literature “As Peter Thompson, a 16-year-old hunter, said ..” “Dorothy Byrne, a state coordinator for the Florida Green Party…”

We train a separate SVM classifier to identify such cases. The features are: All the baseline, lexical and semantic scores of the top candidate. Score assigned to the top candidate by the ranker. The “confidence” of the ranker on the top candidate with respect to

second-best disambiguation. Good-Turing probability of out-of-Wikipedia occurrence for the

mention. Limited success; future research.

Comparison to the previous state of the art (all mentions, including OOW)

Dataset Baseline Milne&Witten Our System-GLOW

ACE 69.52 72.76 77.25

MSN News 72.83 68.49 74.88

AQUAINT 82.64 83.61 83.94

Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan...

Documents

Disambiguation of Biomedical Text

Syntactic Disambiguation through Lexicon Enrichment

KingArthurKingArthur Forotheruses,seeKingArthur(disambiguation). “ArthurPendragon”redirectshere. Forotheruses,see ArthurPendragon(disambiguation

Word sense disambiguation

Word-sense disambiguation

Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer

Creating Translation Context with Disambiguation

Random Disambiguation Paths

Inventor Disambiguation Workshop EVALUATION OUTCOMES

Word Sense Disambiguation - cuni.cz

Lecture: Word Sense Disambiguation

Toponym Disambiguation in Information Retrieval

Word Sense Discovery and Disambiguation

Disambiguation deduplication wp-v4

Entity Disambiguation

Supplementalmaterial1:nunez/web/Supplementals.pdfSupplementalmaterial1: The$disambiguation$task$used$in$Study$1$ $ The$set4up$of$the$reference$disambiguation$task$used$in$Study$1.$Participants$sat$

Sculpture - sweethaven02.com · Sculpture “Sculptor”redirectshere. Forotheruses,seeSculptor (disambiguation)andSculpture(disambiguation). Sculptureisthebranchofthevisualartsthatoperates

Word Sense Disambiguation - umm-csci.github.io · Introduction Word Sense Disambiguation Word Sense Disambiguation (WSD) is the task of identifying which sense of an ambiguous word

Word Sense Disambiguation Adapted Weighted Graph forrepository.unmuhjember.ac.id/3652/1/6 Adapted Weighted... · 2020. 2. 25. · Word Sense Disambiguation by Bagus Setya Rintyarna

5. Taxonomy induction, entity disambiguation, coreference ......5. Taxonomy induction, entity disambiguation, coreference resolution Simon Razniewski Winter semester 2019/20 1 Announcements