89
Entity Linking 10th Russian Summer School in Information Retrieval (RuSSIR 2016) | Saratov, Russia, 2016 Krisztian Balog University of Stavanger @krisztianbalog

Entity Linking

Embed Size (px)

Citation preview

Page 1: Entity Linking

Entity Linking

10th Russian Summer School in Information Retrieval (RuSSIR 2016) | Saratov, Russia, 2016

Krisztian Balog University of Stavanger@krisztianbalog

Page 2: Entity Linking

From Named Entity Recognition to Entity Linking

Page 3: Entity Linking

Named entity recognition (NER)- Also known as entity identification, entity extraction, and

entity chunking- Task: identifying named entities in text and labeling them with

one of the possible entity types- Person (PER), organization (ORG), location (LOC), miscellaneous

(MISC). Sometimes also temporal expressions (TIMEX) and certain types of numerical expressions (NUMEX)

<LOC>Silicon Valley</LOC> venture capitalist <PER>Michael Moritz</PER> said that today's billion-dollar "unicorn" startups can learn from <ORG>Apple</ORG> founder <PER>Steve Jobs</PER>

Page 4: Entity Linking

Named entity disambiguation- Also called named entity normalization and named entity

resolution- Task: assign ambiguous entity names to canonical entities

from some catalog- It is usually assumed that entities have already been recognized in the

input text (i.e., it has been processed by a NER system)

Page 5: Entity Linking

Wikification- Named entity disambiguation using Wikipedia as the catalog

of entities- Also annotating concepts, not only entities

Page 6: Entity Linking

Entity linking- Task: recognizing entity mentions in text and linking them to

the corresponding entries in a knowledge base (KB)- Limited to recognizing entities for which a target entry exists in the

reference KB; each KB entry is a candidate - It is assumed that the document provides sufficient context for

disambiguating entities

- Knowledge base (working definition):- A catalog of entities, each with one or more names (surface forms),

links to other entities, and, optionally, a textual description - Wikipedia, DBpedia, Freebase, YAGO, etc.

Page 7: Entity Linking

Overview of entity annotation tasks

Task Recognition Assignment

Named entity recognition entities entity type

Named entity disambiguation entities entity ID / NIL

Wikification entities and concepts entity ID / NIL

Entity linking entities entity ID

Page 8: Entity Linking

Entity linking in action

Page 9: Entity Linking

Entity linking in action

Page 10: Entity Linking

Anatomy of an entity linking system

Mention detection

Candidate selection Disambiguation entity

annotationsdocument

Identification of text snippets that can potentially be linked to entities

Generating a set of candidate entities for each mention

Selecting a single entity (or none) for each mention, based on the context

Page 11: Entity Linking

Mention detectionMention detection

Candidate selection Disambiguation entity

annotationsdocument

Page 12: Entity Linking

Mention detection- Goal: Detect all “linkable” phrases- Challenges:

- Recall oriented - Do not miss any entity that should be linked

- Find entity name variants - E.g. “jlo” is name variant of [Jennifer Lopez]

- Filter out inappropriate ones - E.g. “new york” matches >2k different entities

Page 13: Entity Linking

Common approach1. Build a dictionary of entity surface forms

- Entities with all names variants

2. Check all document n-grams against the dictionary- The value of n is set typically between 6 and 8

3. Filter out undesired entities- Can be done here or later in the pipeline

Page 14: Entity Linking

Example

Home to the Empire State Building, Times Square, Statue of Liberty and other iconic sites,

Entities (Es)Surface form (s)

… …

Times_SquareTimes_Square_(Hong_Kong)Times_Square_(IRT_42nd_Street_Shuttle)…

Times Square

……

Empire State Building Empire_State_Building

Empire StateEmpire_State_(band)Empire_State_BuildingEmpire_State_Film_Festival

Empire

British_EmpireEmpire_(magazine)First_French_EmpireGalactic_Empire_(Star_Wars)Holy_Roman_EmpireRoman_Empire…

New York City is a fast-paced, globally influential center of art, culture, fashion and finance.

Page 15: Entity Linking

Surface form dictionary construction from Wikipedia- Page title

- Canonical (most common) name of the entity

Page 16: Entity Linking

Surface form dictionary construction from Wikipedia- Page title- Redirect pages

- Alternative names that are frequently used to refer to an entity

Page 17: Entity Linking

Surface form dictionary construction from Wikipedia- Page title- Redirect pages- Disambiguation pages

- List of entities that share the same name

Page 18: Entity Linking

Surface form dictionary construction from Wikipedia- Page title- Redirect pages- Disambiguation pages- Anchor texts

- of links pointing to the entity's Wikipedia page

Page 19: Entity Linking

Surface form dictionary construction from Wikipedia- Page title- Redirect pages- Disambiguation pages- Anchor texts- Bold texts from first paragraph

- generally denote other name variants of the entity

Page 20: Entity Linking

Surface form dictionary construction from other sources- Anchor texts from external web pages pointing to Wikipedia

articles- Problem of synonym discovery

- Expanding acronyms - Leveraging search results or query-click logs from a web search engine - ...

Page 21: Entity Linking

Filtering mentions- Filter our mentions that are unlikely to be linked to any entity- Keyphraseness:

- Link probability:

P (keyphrase|m) =|Dlink(m)||D(m)|

P (link|m) =link(m)

freq(m)

number of Wikipedia articles where m appears as a link

number of Wikipedia articles that contain m

the number of times mention m appears as a link

total number of times mention m occurs in Wikipedia (as a link or not)

Page 22: Entity Linking

Overlapping entity mentions- Dealing with them in this phase

- E.g., by dropping a mention if it is subsumed by another mention

- Keeping them and postponing the decision to a later stage (candidate selection or disambiguation)

Page 23: Entity Linking

Candidate selectionMention detection

Candidate selection Disambiguation entity

annotationsdocument

Page 24: Entity Linking

Candidate selection- Goal: Narrow down the space of disambiguation possibilities- Balances between precision and recall (effectiveness vs.

efficiency)- Often approached as a ranking problem

- Keeping only candidates above a score/rank threshold for downstream processing

Page 25: Entity Linking

Commonness- Perform the ranking of candidate entities based on their

overall popularity, i.e., "most common sense"

P (e|m) =n(m, e)Pe0 n(m, e0)

the number of times entity e is the link destination of mention m

total number of times mention m appears as a link

Page 26: Entity Linking

Example

Home to the Empire State Building, Times Square, Statue of Liberty and other iconic sites,

New York City is a fast-paced, globally influential center of art, culture, fashion and finance.

Times_Square_(IRT_42nd_Street_Shuttle) 0.006

… …

Commonness (P(e|m))Entity (e)

0.011Times_Square_(Hong_Kong)

0.017Times_Square_(film)

Times_Square 0.940

Page 27: Entity Linking

Commonness- Can be pre-computed and stored in the entity surface form

dictionary- Follows a power law with a long tail of extremely unlikely

senses; entities at the tail end of the distribution can be safely discarded- E.g., 0.001 is a sensible threshold

Page 28: Entity Linking

Example

Entity CommonnessGermany 0.9417Germany_national_football_team 0.0139Nazi_Germany 0.0081German_Empire 0.0065...

Entity CommonnessFIFA_World_Cup 0.2358FIS_Apline_Ski_World_Cup 0.06822009_FINA_Swimming_World_Cup 0.0633World_Cup_(men's_golf) 0.0622...

Entity Commonness

1998_FIFA_World_Cup 0.9556

1998_IAAF_World_Cup 0.0296

1998_Alpine_Skiing_World_Cup 0.0059

...

Bulgaria's best World Cup performance was in the 1994 World Cup where they beat Germany, to reach the semi-finals, losing to Italy, and finishing in fourth ...

Page 29: Entity Linking

DisambiguationMention detection

Candidate selection Disambiguation entity

annotationsdocument

Page 30: Entity Linking

Disambiguation- Baseline approach: most common sense- Consider additional types of evidence

- Prior importance of entities and mentions - Contextual similarity between the text surrounding the mention and

the candidate entity - Coherence among all entity linking decisions in the document

- Combine these signals - Using supervised learning or graph-based approaches

- Optionally perform pruning - Reject low confidence or semantically meaningless annotations

Page 31: Entity Linking

Prior importance features- Context-independent features

- Neither the text nor other mentions in the document are taken into account

- Keyphraseness- Link probability- Commonness

Page 32: Entity Linking

Prior importance features- Link prior

- Popularity of the entity measured in terms of incoming links

- Page views - Popularity of the entity measured in terms traffic volume

Plink(e) =link(e)Pe0 link(e

0)

Ppageviews(e) =pageviews(e)Pe0 pageviews(e

0)

Page 33: Entity Linking

Contextual features- Compare the surrounding context of a mention with the

(textual) representation of the given candidate entity- Context of a mention

- Window of text (sentence, paragraph) around the mention - Entire document

- Entity's representation- Wikipedia entity page, first description paragraph, terms with highest

TF-IDF score, etc. - Entity's description in the knowledge base

Page 34: Entity Linking

Contextual similarity- Commonly: bag-of-words representation- Cosine similarity

- Many other options for measuring similarity- Dot product, KL divergence, Jaccard similarity

- Representation does not have to be limited to bag-of-words- Concept vectors (named entities, Wikipedia categories, anchor text,

keyphrases, etc.)

simcos

(m, e) =~dm

· ~de

k ~dm

kk ~de

k

Page 35: Entity Linking

Entity-relatedness features- It can reasonably be assumed that a document focuses on

one or at most a few topics- Therefore, entities mentioned in a document should be

topically related to each other- Capturing topical coherence by developing some measure

of relatedness between (linked) entities- Defined for pairs of entities

Page 36: Entity Linking

Wikipedia Link-based Measure (WSM)

- Often referred to simply as relatedness- A close relationship is assumed between two entities if there

is a large overlap between the entities linking to them

WLM(e, e0) = 1� log(max(|Le|, |Le0 |))� log(|Le \ Le0 |)log(|E|)� log(min(|Le|, |Le0 |))

set of entities that link to etotal number of entities

Page 37: Entity Linking

Wikipedia Link-based Measure (WLM)

Corpus-based approaches obtain background knowledge by performing statistical analysis of large untagged document collections. The most successful and well known of these techniques is Latent Semantic Analysis (Landauer et al., 1998), which relies on the tendency for related words to appear in similar contexts. LSA offers the same vocabulary as the corpus upon which it is built. Unfortunately it can only provide accurate judgments when the corpus is very large, and consequently the pre-processing effort required is significant.

Strube and Ponzetto (2006) were the first to compute measures of semantic relatedness using Wikipedia. Their approach—WikiRelate—took familiar techniques that had previously been applied to WordNet and modified them to suit Wikipedia. Their most accurate approach is based on Leacock & Chodorow’s (1998) path-length measure, which takes into account the depth within WordNet at which the concepts are found. WikiRelate’s implementation does much the same for Wikipedia’s hierarchical category structure. While the results are similar in terms of accuracy to thesaurus based techniques, the collaborative nature of Wikipedia offers a much larger—and constantly evolving—vocabulary.

Gabrilovich and Markovitch (2007) achieve extremely accurate results with ESA, a technique that is somewhat reminiscent of the vector space model widely used in information retrieval. Instead of comparing vectors of term weights to evaluate the similarity between queries and documents, they compare weighted vectors of the Wikipedia articles related to each term. The name of the approach—Explicit Semantic Analysis—stems from the way these vectors are comprised of manually defined

concepts, as opposed to the mathematically derived contexts used by Latent Semantic Analysis. The result is a measure which approaches the accuracy of humans. Additionally, it provides relatedness measures for any length of text: unlike WikiRelate, there is no restriction that the input be matched to article titles.

Obtaining Semantic Relatedness from Wikipedia Links

We have developed a new approach for extracting semantic relatedness measures from Wikipedia, which we call the Wikipedia Link-based Measure (WLM). The central difference between this and other Wikipedia based approaches is the use of Wikipedia’s hyperlink structure to define relatedness. This theoretically offers a measure that is both cheaper and more accurate than ESA: cheaper, because Wikipedia’s extensive textual content can largely be ignored, and more accurate, because it is more closely tied to the manually defined semantics of the resource.

Wikipedia’s extensive network of cross-references, portals, categories and info-boxes provide a huge amount of explicitly defined semantics. Despite the name, Explicit Semantic Analysis takes advantage of only one property: the way in which Wikipedia’s text is segmented into individual topics. It’s central component—the weight between a term and an article—is automatically derived rather than explicitly specified. In contrast, the central component of our approach is the link: a manually-defined connection between two manually disambiguated concepts. Wikipedia provides millions of these connections, as

Global WarmingAutomobile

Petrol Engine

Fossil Fuel

20th Century Emission

Standard

Bicycle

Diesel Engine

Carbon Dioxide Air

PollutionGreenhouse

Gas

Alternative Fuel

Transport

Vehicle

Henry Ford

Combustion Engine

Kyoto Protocol

Ozone

Greenhouse Effect

Planet

Audi

Battery (electricity)

Arctic Circle

Environmental Skepticism

Greenpeace

Ecology

incoming links

outgoing links

Figure 1: Obtaining a semantic relatedness measure between Automobile and Global Warming from Wikipedia links.

incoming links

outgoing links

26

Image taken from Milne and Witten (2008a). An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In AAAI WikiAI Workshop.

Page 38: Entity Linking

Asymmetric relatedness features- A relatedness function does not have to be symmetric

- E.g., the relatedness of the UNITED STATES given NEIL ARMSTRONG is intuitively larger than the relatedness of NEIL ARMSTRONG given the UNITED STATES

- Conditional probability

P (e0|e) = |Le0 \ Le||Le|

Page 39: Entity Linking

Entity-relatedness features- Numerous ways to define relatedness

- Consider not only incoming, but also outgoing links or the union of incoming and outgoing links

- Jaccard similarity, Pointwise Mutual Information (PMI), or the Chi-square statistic, etc.

- Having a single relatedness function is preferred, to keep the disambiguation process simple

- Various relatedness measures can effectively be combined into a single score using a machine learning approach [Ceccarelli et al., 2013]

Page 40: Entity Linking

Overview of features

Page 41: Entity Linking

Disambiguation approaches- Consider local compatibility (including prior evidence) and

coherence with the other entity linking decisions- Task:

- Objective function:�

⇤= argmax

� X

(m,e)2�

�(m, e) + (�)�

� : Md ! E[

{;}

coherence function for all entity annotations in the document

local compatibility between the mention and the assigned entity

This optimization problem is NP-hard!

Need to resort to approximation algorithms and heuristics

Page 42: Entity Linking

Disambiguation strategies- Individually, one-mention-at-a-time

- Rank candidates for each mention, take the top ranked one (or NIL) - Interdependence between entity linking decisions may be incorporated

in a pairwise fashion

- Collectively, all mentions in the document jointly

�(m) = arg max

e2Em

score(m, e)

Page 43: Entity Linking

Disambiguation approaches

Approach Context Entity interdependence

Most common sense none none

Individual local disambiguation text none

Individual global disambiguation text & entities pairwise

Collective disambiguation text & entities collective

Page 44: Entity Linking

Individual local disambiguation- Early entity linking approaches- Local compatibility score can be written as a linear

combination of features

- Learn the "optimal" combination of features from training data using machine learning

�(e,m) =X

i

�ifi(e,m)

Can be both context-independent and context-dependent features

Page 45: Entity Linking

Individual global disambiguation- Consider what other entities are mentioned in the document- True global optimization would be NP-hard- Good approximation can be computed efficiently by

considering pairwise interdependencies for each mention independently- Pairwise entity relatedness scores need to be aggregated into a single

number (how coherent the given candidate entity is with the rest of the entities in the document)

Page 46: Entity Linking

Approach #1: GLOW[Ratinov et al., 2011]

- First perform local disambiguation (E*)- The coherence of entity e with all other linked entities in the

document:

- Use the predictions in a second, global disambiguation round

gj(e, E?) = [

X

e02E?\{e}

fj(e, e0)]

aggregator function (max or avg)

particular relatedness function

score(m, e) =X

i

�ifi(e,m) +

X

j

�jgj(e, E?)

Page 47: Entity Linking

Approach #2: TAGME[Ferragina & Scaiella, 2010]

- Combine the two most important features (commonness and relatedness) using a voting scheme

- The score of a candidate entity for a particular mention:

- The vote function estimates the agreement between e and all candidate entities of all other mentions in the document

score(m, e) =X

m02Md\{m}

vote(m0, e)

Page 48: Entity Linking

TAGME (voting mechanism)- Average relatedness between each possible disambiguation,

weighted by its commonness score

vote(m0, e) =

Pe02Em0 WLM(e, e0)P (e0|m0

)

|Em0 |

entity

entity

mention mention m

entity

entity

entity e

entity

entity e’

mention m’

entity

P(e’|m’)

WLM(e,e’)

Page 49: Entity Linking

TAGME (final score)- Final decision uses a simple but robust heuristic

- The top entities with the highest score are considered for a given mention and the one with the highest commonness score is selected

�(m) = arg max

e2Em

{P (e|m) : e 2 top✏[score(m, e)]}

score(m, e) =X

m02Md\{m}

vote(m0, e)

Page 50: Entity Linking

Collective disambiguation- Graph-based representation- Mention-entity edges capture the local compatibility

between the mention and the entity- Measured using a combination of context-independent and context-

dependent features

- Entity-entity edges represent the semantic relatedness between a pair of entities- Common choice is relatedness (WLM)

- Use these relations jointly to identify a single referent entity (or none) for each of the mentions

Page 51: Entity Linking

Example

There has been several research which focused on computing the semantic relatedness between entities (Strube and Ponzetto [19]; Milne and Witten [9]). In this paper, we adopt the method proposed by Milne and Witten [9] to compute the semantic relatedness between entities, which computes the semantic relatedness as:

log(max( )) log( )( , ) 1

log( ) log(min( , ))

A B A BSR a b

W A B�

��

�ˈ

where a and b are the two entities of interest, A and B are the sets of all entities that link to a and b in Wikipedia respectively, and W is the entire Wikipedia. We show an example of semantic relatedness between four selected entities in Table 1, where the semantic relatedness can reveal the semantic relations between Michael Jordan and Space Jam, and between Michael Jordan and Chicago Bulls.

Space Jam Chicago Bulls Michael Jordan 0.66 0.82

Michael B. Jordan 0.00 0.00

Table 1. The relatedness table of four selected entities

3.3 The Referent Graph Now we have two relations: the Compatible relation between name mention and entity and the Semantic-Related relation between entities. In this way, the interdependence between the EL decisions in a document can be best represented as a graph, which we refer to as Referent Graph. Concretely, the Referent Graph is defined as follows:

A Referent Graph is a weighted graph G=(V, E), where the node set V contains all name mentions in a document and all the possible referent entities of these name mentions, with each node representing a name mention or an entity; each edge between a name mention and an entity represents a Compatible relation between them; each edge between two entities represents a Semantic-Related relation between them.

For illustration, Figure 2 shows the Referent Graph representation of the EL problem in Example 1.

Space Jam

Chicago Bulls

Bull

Michael Jordan

Michael I. Jordan

Michael B. Jordan

Space Jam

Bulls Jordan

Mention

Entity

0.66

0.82

0.13

0.01

0.20

0.12

0.03

0.08

Figure 2. The Referent Graph of Example 1

By representing both the local mention-to-entity compatibility and the global entity relation as edges, two types of dependencies are captured in Referent Graph:

1) Local Dependency between name mention and entity. In Referent Graph, the local dependency between a name mention and an entity is encoded as the edge between the nodes corresponding to them, with an edge weight indicates the strength of this dependency. For example, in Figure 2 the dependency between (Bulls, Chicago Bulls) is represented as an edge between them. Notice here the dependency between name mention and

entity is asymmetric: only the entity depends on the name mention, but the name mention does not depend on the entity.

2) Global Interdependence between EL decisions. By connecting candidate referent entities using the Semantic-Related relation, the interdependence between EL decisions is encoded as the graph structure of the Referent Graph. In this way, the referent graph allows us to deduce and use indirect and implicit dependencies, and can take the structural properties of the interdependence into consideration. For example, the name mention Bulls is related to the entity Chicago Bulls, which in turn is related successively to the entity Michael Jordan. An indirect relation between Bulls and Michael Jordan could be established and used in the EL when necessary. Such indirect relations cannot be identified using an approach based on pair-wise dependence modeling.

The Referent Graph Construction. Given a document, the construction of Referent Graph takes three steps: name mention identification, candidate entity selection and node connection. In this paper, we focus on the task of linking entities with Wikipedia, even though the proposed method can be applied to other knowledge bases. Thus we will only show how to construct the Referent Graph using Wikipedia:

1) Name Mention Identification. We first identify all name mentions in a document. Given a document, we gather all N-grams (up to 8 words) and match them to the anchor dictionary of Wikipedia (Medelyan et al. [16]). Not all matches are considered, because even stop words such as “is” and “an” may match to the anchor texts. We use Mihalcea and Csomai [2]’s keyphraseness feature to filter out the meaningless name mentions, and the retained name mentions will be represented in the graph.

2) Candidate Entity Selection. In this step, we select the candidate referent entities for each name mention detected in Step 1. We adopt the method described in Milne & Witten [14], where a name mention’s candidate referent entities are the destination Wikipedia articles of the anchor text which are the same as this name mention.

3) Node Connection. In this step, we add the dependency edge to the Referent Graph. For each name mention, we add a Compatible edge between it and each of its candidate referent entities using the method in Section 3.1. For each pairs of entities in Referent Graph, if there is a Semantic-Relation between them, we add an edge between them using the method described in Section 3.2.

4. COLLECTIVE ENTITY LINKING In this section, we propose a purely collective inference algorithm, which can jointly identify the referent entities of all name mentions in the same document. Given a Referent Graph representation, the goal of the collective EL is to use both the Compatible and the Semantic-Related relations simultaneously in EL decision making.

4.1 Problem Reformulation As shown in Section 1, the referent entity of a name mention m should be both:

y Compatible with the name mention m;

768

Page 52: Entity Linking

AIDA [Hoffart et al., 2011]- Problem formulation: find a dense subgraph that contains all

mention nodes and exactly one mention-entity edge for each mention

- Greedy algorithm iteratively removes edges

Page 53: Entity Linking

Algorithm- Start with the full graph - Iteratively remove the entity node with the lowest weighted

degree (along with all its incident edges), provided that each mention node remains connected to at least one entity- Weighted degree of an entity node is the sum of the weights of its

incident edges

- The graph with the highest density is kept as the solution- The density of the graph is measured as the minimum weighted degree

among its entity nodes

Page 54: Entity Linking

Example iteration #1

There has been several research which focused on computing the semantic relatedness between entities (Strube and Ponzetto [19]; Milne and Witten [9]). In this paper, we adopt the method proposed by Milne and Witten [9] to compute the semantic relatedness between entities, which computes the semantic relatedness as:

log(max( )) log( )( , ) 1

log( ) log(min( , ))

A B A BSR a b

W A B�

��

�ˈ

where a and b are the two entities of interest, A and B are the sets of all entities that link to a and b in Wikipedia respectively, and W is the entire Wikipedia. We show an example of semantic relatedness between four selected entities in Table 1, where the semantic relatedness can reveal the semantic relations between Michael Jordan and Space Jam, and between Michael Jordan and Chicago Bulls.

Space Jam Chicago Bulls Michael Jordan 0.66 0.82

Michael B. Jordan 0.00 0.00

Table 1. The relatedness table of four selected entities

3.3 The Referent Graph Now we have two relations: the Compatible relation between name mention and entity and the Semantic-Related relation between entities. In this way, the interdependence between the EL decisions in a document can be best represented as a graph, which we refer to as Referent Graph. Concretely, the Referent Graph is defined as follows:

A Referent Graph is a weighted graph G=(V, E), where the node set V contains all name mentions in a document and all the possible referent entities of these name mentions, with each node representing a name mention or an entity; each edge between a name mention and an entity represents a Compatible relation between them; each edge between two entities represents a Semantic-Related relation between them.

For illustration, Figure 2 shows the Referent Graph representation of the EL problem in Example 1.

Space Jam

Chicago Bulls

Bull

Michael Jordan

Michael I. Jordan

Michael B. Jordan

Space Jam

Bulls Jordan

Mention

Entity

0.66

0.82

0.13

0.01

0.20

0.12

0.03

0.08

Figure 2. The Referent Graph of Example 1

By representing both the local mention-to-entity compatibility and the global entity relation as edges, two types of dependencies are captured in Referent Graph:

1) Local Dependency between name mention and entity. In Referent Graph, the local dependency between a name mention and an entity is encoded as the edge between the nodes corresponding to them, with an edge weight indicates the strength of this dependency. For example, in Figure 2 the dependency between (Bulls, Chicago Bulls) is represented as an edge between them. Notice here the dependency between name mention and

entity is asymmetric: only the entity depends on the name mention, but the name mention does not depend on the entity.

2) Global Interdependence between EL decisions. By connecting candidate referent entities using the Semantic-Related relation, the interdependence between EL decisions is encoded as the graph structure of the Referent Graph. In this way, the referent graph allows us to deduce and use indirect and implicit dependencies, and can take the structural properties of the interdependence into consideration. For example, the name mention Bulls is related to the entity Chicago Bulls, which in turn is related successively to the entity Michael Jordan. An indirect relation between Bulls and Michael Jordan could be established and used in the EL when necessary. Such indirect relations cannot be identified using an approach based on pair-wise dependence modeling.

The Referent Graph Construction. Given a document, the construction of Referent Graph takes three steps: name mention identification, candidate entity selection and node connection. In this paper, we focus on the task of linking entities with Wikipedia, even though the proposed method can be applied to other knowledge bases. Thus we will only show how to construct the Referent Graph using Wikipedia:

1) Name Mention Identification. We first identify all name mentions in a document. Given a document, we gather all N-grams (up to 8 words) and match them to the anchor dictionary of Wikipedia (Medelyan et al. [16]). Not all matches are considered, because even stop words such as “is” and “an” may match to the anchor texts. We use Mihalcea and Csomai [2]’s keyphraseness feature to filter out the meaningless name mentions, and the retained name mentions will be represented in the graph.

2) Candidate Entity Selection. In this step, we select the candidate referent entities for each name mention detected in Step 1. We adopt the method described in Milne & Witten [14], where a name mention’s candidate referent entities are the destination Wikipedia articles of the anchor text which are the same as this name mention.

3) Node Connection. In this step, we add the dependency edge to the Referent Graph. For each name mention, we add a Compatible edge between it and each of its candidate referent entities using the method in Section 3.1. For each pairs of entities in Referent Graph, if there is a Semantic-Relation between them, we add an edge between them using the method described in Section 3.2.

4. COLLECTIVE ENTITY LINKING In this section, we propose a purely collective inference algorithm, which can jointly identify the referent entities of all name mentions in the same document. Given a Referent Graph representation, the goal of the collective EL is to use both the Compatible and the Semantic-Related relations simultaneously in EL decision making.

4.1 Problem Reformulation As shown in Section 1, the referent entity of a name mention m should be both:

y Compatible with the name mention m;

768

0.01

weighted degree

Which entity should be removed?

0.95

0.86 0.03

1.56

0.12xxWhat is the density of the graph? 0.03

Page 55: Entity Linking

Example iteration #2

There has been several research which focused on computing the semantic relatedness between entities (Strube and Ponzetto [19]; Milne and Witten [9]). In this paper, we adopt the method proposed by Milne and Witten [9] to compute the semantic relatedness between entities, which computes the semantic relatedness as:

log(max( )) log( )( , ) 1

log( ) log(min( , ))

A B A BSR a b

W A B�

��

�ˈ

where a and b are the two entities of interest, A and B are the sets of all entities that link to a and b in Wikipedia respectively, and W is the entire Wikipedia. We show an example of semantic relatedness between four selected entities in Table 1, where the semantic relatedness can reveal the semantic relations between Michael Jordan and Space Jam, and between Michael Jordan and Chicago Bulls.

Space Jam Chicago Bulls Michael Jordan 0.66 0.82

Michael B. Jordan 0.00 0.00

Table 1. The relatedness table of four selected entities

3.3 The Referent Graph Now we have two relations: the Compatible relation between name mention and entity and the Semantic-Related relation between entities. In this way, the interdependence between the EL decisions in a document can be best represented as a graph, which we refer to as Referent Graph. Concretely, the Referent Graph is defined as follows:

A Referent Graph is a weighted graph G=(V, E), where the node set V contains all name mentions in a document and all the possible referent entities of these name mentions, with each node representing a name mention or an entity; each edge between a name mention and an entity represents a Compatible relation between them; each edge between two entities represents a Semantic-Related relation between them.

For illustration, Figure 2 shows the Referent Graph representation of the EL problem in Example 1.

Space Jam

Chicago Bulls

Bull

Michael Jordan

Michael I. Jordan

Michael B. Jordan

Space Jam

Bulls Jordan

Mention

Entity

0.66

0.82

0.13

0.01

0.20

0.12

0.03

0.08

Figure 2. The Referent Graph of Example 1

By representing both the local mention-to-entity compatibility and the global entity relation as edges, two types of dependencies are captured in Referent Graph:

1) Local Dependency between name mention and entity. In Referent Graph, the local dependency between a name mention and an entity is encoded as the edge between the nodes corresponding to them, with an edge weight indicates the strength of this dependency. For example, in Figure 2 the dependency between (Bulls, Chicago Bulls) is represented as an edge between them. Notice here the dependency between name mention and

entity is asymmetric: only the entity depends on the name mention, but the name mention does not depend on the entity.

2) Global Interdependence between EL decisions. By connecting candidate referent entities using the Semantic-Related relation, the interdependence between EL decisions is encoded as the graph structure of the Referent Graph. In this way, the referent graph allows us to deduce and use indirect and implicit dependencies, and can take the structural properties of the interdependence into consideration. For example, the name mention Bulls is related to the entity Chicago Bulls, which in turn is related successively to the entity Michael Jordan. An indirect relation between Bulls and Michael Jordan could be established and used in the EL when necessary. Such indirect relations cannot be identified using an approach based on pair-wise dependence modeling.

The Referent Graph Construction. Given a document, the construction of Referent Graph takes three steps: name mention identification, candidate entity selection and node connection. In this paper, we focus on the task of linking entities with Wikipedia, even though the proposed method can be applied to other knowledge bases. Thus we will only show how to construct the Referent Graph using Wikipedia:

1) Name Mention Identification. We first identify all name mentions in a document. Given a document, we gather all N-grams (up to 8 words) and match them to the anchor dictionary of Wikipedia (Medelyan et al. [16]). Not all matches are considered, because even stop words such as “is” and “an” may match to the anchor texts. We use Mihalcea and Csomai [2]’s keyphraseness feature to filter out the meaningless name mentions, and the retained name mentions will be represented in the graph.

2) Candidate Entity Selection. In this step, we select the candidate referent entities for each name mention detected in Step 1. We adopt the method described in Milne & Witten [14], where a name mention’s candidate referent entities are the destination Wikipedia articles of the anchor text which are the same as this name mention.

3) Node Connection. In this step, we add the dependency edge to the Referent Graph. For each name mention, we add a Compatible edge between it and each of its candidate referent entities using the method in Section 3.1. For each pairs of entities in Referent Graph, if there is a Semantic-Relation between them, we add an edge between them using the method described in Section 3.2.

4. COLLECTIVE ENTITY LINKING In this section, we propose a purely collective inference algorithm, which can jointly identify the referent entities of all name mentions in the same document. Given a Referent Graph representation, the goal of the collective EL is to use both the Compatible and the Semantic-Related relations simultaneously in EL decision making.

4.1 Problem Reformulation As shown in Section 1, the referent entity of a name mention m should be both:

y Compatible with the name mention m;

768

weighted degree

Which entity should be removed?

0.95

0.86 0.03

1.56

0.12xxx x

What is the density of the graph? 0.12

Page 56: Entity Linking

Example iteration #3

There has been several research which focused on computing the semantic relatedness between entities (Strube and Ponzetto [19]; Milne and Witten [9]). In this paper, we adopt the method proposed by Milne and Witten [9] to compute the semantic relatedness between entities, which computes the semantic relatedness as:

log(max( )) log( )( , ) 1

log( ) log(min( , ))

A B A BSR a b

W A B�

��

�ˈ

where a and b are the two entities of interest, A and B are the sets of all entities that link to a and b in Wikipedia respectively, and W is the entire Wikipedia. We show an example of semantic relatedness between four selected entities in Table 1, where the semantic relatedness can reveal the semantic relations between Michael Jordan and Space Jam, and between Michael Jordan and Chicago Bulls.

Space Jam Chicago Bulls Michael Jordan 0.66 0.82

Michael B. Jordan 0.00 0.00

Table 1. The relatedness table of four selected entities

3.3 The Referent Graph Now we have two relations: the Compatible relation between name mention and entity and the Semantic-Related relation between entities. In this way, the interdependence between the EL decisions in a document can be best represented as a graph, which we refer to as Referent Graph. Concretely, the Referent Graph is defined as follows:

A Referent Graph is a weighted graph G=(V, E), where the node set V contains all name mentions in a document and all the possible referent entities of these name mentions, with each node representing a name mention or an entity; each edge between a name mention and an entity represents a Compatible relation between them; each edge between two entities represents a Semantic-Related relation between them.

For illustration, Figure 2 shows the Referent Graph representation of the EL problem in Example 1.

Space Jam

Chicago Bulls

Bull

Michael Jordan

Michael I. Jordan

Michael B. Jordan

Space Jam

Bulls Jordan

Mention

Entity

0.66

0.82

0.13

0.01

0.20

0.12

0.03

0.08

Figure 2. The Referent Graph of Example 1

By representing both the local mention-to-entity compatibility and the global entity relation as edges, two types of dependencies are captured in Referent Graph:

1) Local Dependency between name mention and entity. In Referent Graph, the local dependency between a name mention and an entity is encoded as the edge between the nodes corresponding to them, with an edge weight indicates the strength of this dependency. For example, in Figure 2 the dependency between (Bulls, Chicago Bulls) is represented as an edge between them. Notice here the dependency between name mention and

entity is asymmetric: only the entity depends on the name mention, but the name mention does not depend on the entity.

2) Global Interdependence between EL decisions. By connecting candidate referent entities using the Semantic-Related relation, the interdependence between EL decisions is encoded as the graph structure of the Referent Graph. In this way, the referent graph allows us to deduce and use indirect and implicit dependencies, and can take the structural properties of the interdependence into consideration. For example, the name mention Bulls is related to the entity Chicago Bulls, which in turn is related successively to the entity Michael Jordan. An indirect relation between Bulls and Michael Jordan could be established and used in the EL when necessary. Such indirect relations cannot be identified using an approach based on pair-wise dependence modeling.

The Referent Graph Construction. Given a document, the construction of Referent Graph takes three steps: name mention identification, candidate entity selection and node connection. In this paper, we focus on the task of linking entities with Wikipedia, even though the proposed method can be applied to other knowledge bases. Thus we will only show how to construct the Referent Graph using Wikipedia:

1) Name Mention Identification. We first identify all name mentions in a document. Given a document, we gather all N-grams (up to 8 words) and match them to the anchor dictionary of Wikipedia (Medelyan et al. [16]). Not all matches are considered, because even stop words such as “is” and “an” may match to the anchor texts. We use Mihalcea and Csomai [2]’s keyphraseness feature to filter out the meaningless name mentions, and the retained name mentions will be represented in the graph.

2) Candidate Entity Selection. In this step, we select the candidate referent entities for each name mention detected in Step 1. We adopt the method described in Milne & Witten [14], where a name mention’s candidate referent entities are the destination Wikipedia articles of the anchor text which are the same as this name mention.

3) Node Connection. In this step, we add the dependency edge to the Referent Graph. For each name mention, we add a Compatible edge between it and each of its candidate referent entities using the method in Section 3.1. For each pairs of entities in Referent Graph, if there is a Semantic-Relation between them, we add an edge between them using the method described in Section 3.2.

4. COLLECTIVE ENTITY LINKING In this section, we propose a purely collective inference algorithm, which can jointly identify the referent entities of all name mentions in the same document. Given a Referent Graph representation, the goal of the collective EL is to use both the Compatible and the Semantic-Related relations simultaneously in EL decision making.

4.1 Problem Reformulation As shown in Section 1, the referent entity of a name mention m should be both:

y Compatible with the name mention m;

768

weighted degree

Which entity should be removed?

0.95

0.86 0.03

1.56

0.12xxx x

x x

What is the density of the graph? 0.86

Page 57: Entity Linking

Pre- and post-processing- Pre-processing phase: remove entities that are "too distant"

from the mention nodes- At the end of the iterations, the solution graph may still

contain mentions that are connected to more than one entity; deal with this in post-processing- If the graph is sufficiently small, it is feasible to exhaustively consider all

possible mention-entity pairs - Otherwise, a faster local (hill-climbing) search algorithm may be used

Page 58: Entity Linking

Random walk with restarts [Han et al., 2011]

- "A random walk with restart is a stochastic process to traverse a graph, resulting in a probability distribution over the vertices corresponding to the likelihood those vertices are visited" [Guo et al., 2014]

- Starting vector holding initial evidence

v(m) =tfidf(m)P

m0inMdtfidf(m0)

Page 59: Entity Linking

Random walk process- T is evidence propagation matrix

- Evidence propagation ratio

- Evidence propagation

T (m ! e) =w(m ! e)P

e02Emw(m ! e0)

T (e ! e0) =w(e ! e0)P

e002Edw(e ! e00)

r0 = vrt+1 = (1� ↵)⇥ rt ⇥ T + ↵⇥ v

vector holding the probability that the nodes are visited

restart probability (0.1)

initial evidence

Page 60: Entity Linking

Final decision- Once the random walk process has converged to a stationary

distribution, the referent entity for each mention

�(m) = arg max

e2Em

�(m, e)⇥ r(e)

local compatibility

�(e,m) =X

i

�ifi(e,m)

Page 61: Entity Linking

Pruning- Discarding meaningless or low-confidence annotations

produced by the disambiguation phase- Simplest solution: use a confidence threshold- More advanced solutions

- Machine learned classifier to retain only entities that are ``relevant enough'' (human editor would annotate them) [Milne & Witten, 2008]

- Optimization problem: decide, for each mention, whether switching the top ranked disambiguation to NIL would improve the objective function [Ratinov et al., 2011]

Page 62: Entity Linking

Entity linking systems

Page 63: Entity Linking

Publicly available entity linking systems

System Reference KB

Online demo

Web API

Source code

AIDAhttp://www.mpi-inf.mpg.de/yago-naga/aida/ YAGO2 Yes Yes Yes (Java)

DBpedia Spotlighthttp://spotlight.dbpedia.org/ DBpedia Yes Yes Yes (Java)

Illinois Wikifierhttp://cogcomp.cs.illinois.edu/page/download_view/Wikifier Wikipedia No No Yes (Java)

TAGMEhttps://tagme.d4science.org/tagme/ Wikipedia Yes Yes Yes (Java)

Wikipedia Minerhttps://github.com/dnmilne/wikipediaminer Wikipedia No No Yes (Java)

Page 64: Entity Linking

Publicly available entity linking systems

- AIDA [Hoffart et al., 2011] - Collective disambiguation using a graph-based approach (see under

Disambiguation section)

- DBpedia Spotlight [Mendes et al., 2011]- Straightforward local disambiguation approach

- Vector space representations of entities are compared against the paragraphs of their mentions using the cosine similarity;

- Introduce Inverse Candidate Frequency (ICF) instead of using IDF - Annotate with DBpedia entities, which can be restricted to certain

types or even to a custom entity set defined by a SPARQL query

Page 65: Entity Linking

Publicly available entity linking systems

- Illinois Wikifier (a.k.a. GLOW) [Ratinov et al., 2011]- Individual global disambiguation (see under Disambiguation section)

- TAGME [Ferragina & Scaiella, 2011]- One of the most popular entity linking systems - Designed specifically for efficient annotation of short texts, but works

well on long texts too - Individual global disambiguation (see under Disambiguation section)

- Wikipedia Miner [Milne & Witten, 2008]- Seminal entity linking system that was first to ...

- combine commonness and relatedness for (local) disambiguation - have open-source implementation and be offered as a web service

Page 66: Entity Linking

Evaluation

Page 67: Entity Linking

Evaluation (end-to-end)- Comparing the system-generated annotations against a

human-annotated gold standard- Evaluation criteria

- Perfect match: both the linked entity and the mention offsets must match

- Relaxed match: the linked entity must match, it is sufficient if the mention overlaps with the gold standard

Page 68: Entity Linking

Evaluation with relaxed matchExample #1 Example #2

ground truth system annotation ground truth system annotation

Page 69: Entity Linking

Evaluation metrics- Set-based metrics:

- Precision: fraction of correctly linked entities that have been annotated by the system

- Recall: fraction of correctly linked entities that should be annotated - F-measure: harmonic mean of precision and recall

- Metrics are computed over a collection of documents- Micro-averaged: aggregated across mentions - Macro-averaged: aggregated across documents

Page 70: Entity Linking

Evaluation metrics- Micro-averaged

- Macro-averaged

- F1 score

Pmic =|AD \ AD|

|AD|Rmic =

|AD \ AD||AD|

ground truth annotations

annotations generated by the entity linking system

Pmac =X

d2D

|Ad \ Ad||Ad|

/|D| Rmac =X

d2D

|Ad \ Ad||Ad|

/|D|

F1 =2⇥ P⇥ R

P+ R

Page 71: Entity Linking

Test collections by individual researchers

System Reference KB Documents #Docs Annots. #Mentions All? NIL

MSNBC [Cucerzan, 2007] Wikipedia News 20 entities 755 Yes Yes

AQUAINT [Milne & Witten, 2008] Wikipedia News 50 entities &

concepts 727 No No

IITB [Kulkarni et al, 2009] Wikipedia Web pages 107 entities 17,200 Yes Yes

ACE2004 [Ratinov et al., 2011] Wikipedia News 57 entities 306 Yes Yes

CoNLL-YAGO [Hoffart et al., 2011] YAGO2 News 1,393 entites 34,956 Yes Yes

Page 72: Entity Linking

Evaluation campaigs- INEX Link-the-Wiki- TAC Entity Linking- Entity Recognition and Disambiguation (ERD) Challenge

Page 73: Entity Linking

Component-based evaluation- The pipeline architecture makes the evaluation of entity

linking systems especially challenging- The main focus is on the disambiguation component, but its

performance is largely influenced by the preceding steps

- Fair comparison between two approaches can only be made if they share all other elements of the pipeline

Page 74: Entity Linking

Meta evaluations- [Hatchey et al., 2009]

- Reimplements three state-of-the-art systems

- DEXTER [Ceccarelli et al., 2013]- Reimplement TAGME, Wikipedia Miner, and collective EL [Han et al., 2011]

- BAT-Framework [Cornolti et al., 2013]- Comparing publicly available entity annotation systems (AIDA, Illinois

Wikifier, TAGME, Wikipedia Miner, DBpedia Spotlight) - Number of test collections (news, web pages, and tweets)

- GEBRIL [Usbeck et al., 2015]- Extends the BAT-Framework by being able to link to any knowledge base - Benchmarking any annotation service through a REST interface - Provides persistent URLs for experimental settings (for reproducibility)

Page 75: Entity Linking

Resources

Page 76: Entity Linking

A Cross-Lingual Dictionary for English Wikipedia Concepts- Collecting strings (mentions) that link to Wikipedia articles on

the Web- https://research.googleblog.com/2012/05/from-words-to-concepts-

and-back.html

number of times the mention is linked to that articlestring (mention) Wikipedia article

Page 77: Entity Linking

Freebase Annotations of the ClueWeb Corpora- ClueWeb annotated with Freebase entities (by Google)

- http://lemurproject.org/clueweb09/FACC1/ - http://lemurproject.org/clueweb12/FACC1/

name of the document that was annotated

entity mention beginning and end byte offsets

entity ID in Freebaseconfidence given both the mention and the context

confidence given just the context (ignoring the mention)

Page 78: Entity Linking

Entity linking in queries

Page 79: Entity Linking

Entity linking in queries- Challenges

- search queries are short - limited context - lack of proper grammar, spelling - multiple interpretations - needs to be fast

Page 80: Entity Linking

Examplethe governator

movieperson

Page 81: Entity Linking

Examplethe governator movie

Page 82: Entity Linking

Examplenew york pizza manhattan

Page 83: Entity Linking

Examplenew york pizza manhattan

Page 84: Entity Linking

ERD’14 challenge- Task: finding query interpretations- Input: keyword query- Output: sets of sets of entities- Reference KB: Freebase- Annotations are to be performed by a web service within a

given time limit

Page 85: Entity Linking

Evaluation

P =|ITI|

|I|R =

|ITI|

|I|F =

2 · P ·RP +R

New York City, Manhattan

ground truth system annotationI I

new york pizza manhattan

New York-style pizza, Manhattan

New York City, Manhattan

New York-style pizza

Page 86: Entity Linking

ERD’14 results

Single interpretation is returned

Rank Team F1 latency

1 SMAPH Team 0.7076 0.49

2 NTUNLP 0.6797 1.04

3 Seznam Research 0.6693 3.91

http://web-ngram.research.microsoft.com/erd2014/LeaderBoard.aspx

Page 87: Entity Linking

Hands-on part http://bit.ly/russir2016-elx

Page 88: Entity Linking
Page 89: Entity Linking

Questions?

Contact | @krisztianbalog | krisztianbalog.com