Upload
krisztianbalog
View
298
Download
1
Embed Size (px)
Citation preview
Entity Linking
10th Russian Summer School in Information Retrieval (RuSSIR 2016) | Saratov, Russia, 2016
Krisztian Balog University of Stavanger@krisztianbalog
From Named Entity Recognition to Entity Linking
Named entity recognition (NER)- Also known as entity identification, entity extraction, and
entity chunking- Task: identifying named entities in text and labeling them with
one of the possible entity types- Person (PER), organization (ORG), location (LOC), miscellaneous
(MISC). Sometimes also temporal expressions (TIMEX) and certain types of numerical expressions (NUMEX)
<LOC>Silicon Valley</LOC> venture capitalist <PER>Michael Moritz</PER> said that today's billion-dollar "unicorn" startups can learn from <ORG>Apple</ORG> founder <PER>Steve Jobs</PER>
Named entity disambiguation- Also called named entity normalization and named entity
resolution- Task: assign ambiguous entity names to canonical entities
from some catalog- It is usually assumed that entities have already been recognized in the
input text (i.e., it has been processed by a NER system)
Wikification- Named entity disambiguation using Wikipedia as the catalog
of entities- Also annotating concepts, not only entities
Entity linking- Task: recognizing entity mentions in text and linking them to
the corresponding entries in a knowledge base (KB)- Limited to recognizing entities for which a target entry exists in the
reference KB; each KB entry is a candidate - It is assumed that the document provides sufficient context for
disambiguating entities
- Knowledge base (working definition):- A catalog of entities, each with one or more names (surface forms),
links to other entities, and, optionally, a textual description - Wikipedia, DBpedia, Freebase, YAGO, etc.
Overview of entity annotation tasks
Task Recognition Assignment
Named entity recognition entities entity type
Named entity disambiguation entities entity ID / NIL
Wikification entities and concepts entity ID / NIL
Entity linking entities entity ID
Entity linking in action
Entity linking in action
Anatomy of an entity linking system
Mention detection
Candidate selection Disambiguation entity
annotationsdocument
Identification of text snippets that can potentially be linked to entities
Generating a set of candidate entities for each mention
Selecting a single entity (or none) for each mention, based on the context
Mention detectionMention detection
Candidate selection Disambiguation entity
annotationsdocument
Mention detection- Goal: Detect all “linkable” phrases- Challenges:
- Recall oriented - Do not miss any entity that should be linked
- Find entity name variants - E.g. “jlo” is name variant of [Jennifer Lopez]
- Filter out inappropriate ones - E.g. “new york” matches >2k different entities
Common approach1. Build a dictionary of entity surface forms
- Entities with all names variants
2. Check all document n-grams against the dictionary- The value of n is set typically between 6 and 8
3. Filter out undesired entities- Can be done here or later in the pipeline
Example
Home to the Empire State Building, Times Square, Statue of Liberty and other iconic sites,
Entities (Es)Surface form (s)
… …
Times_SquareTimes_Square_(Hong_Kong)Times_Square_(IRT_42nd_Street_Shuttle)…
Times Square
……
Empire State Building Empire_State_Building
Empire StateEmpire_State_(band)Empire_State_BuildingEmpire_State_Film_Festival
Empire
British_EmpireEmpire_(magazine)First_French_EmpireGalactic_Empire_(Star_Wars)Holy_Roman_EmpireRoman_Empire…
New York City is a fast-paced, globally influential center of art, culture, fashion and finance.
Surface form dictionary construction from Wikipedia- Page title
- Canonical (most common) name of the entity
Surface form dictionary construction from Wikipedia- Page title- Redirect pages
- Alternative names that are frequently used to refer to an entity
Surface form dictionary construction from Wikipedia- Page title- Redirect pages- Disambiguation pages
- List of entities that share the same name
Surface form dictionary construction from Wikipedia- Page title- Redirect pages- Disambiguation pages- Anchor texts
- of links pointing to the entity's Wikipedia page
Surface form dictionary construction from Wikipedia- Page title- Redirect pages- Disambiguation pages- Anchor texts- Bold texts from first paragraph
- generally denote other name variants of the entity
Surface form dictionary construction from other sources- Anchor texts from external web pages pointing to Wikipedia
articles- Problem of synonym discovery
- Expanding acronyms - Leveraging search results or query-click logs from a web search engine - ...
Filtering mentions- Filter our mentions that are unlikely to be linked to any entity- Keyphraseness:
- Link probability:
P (keyphrase|m) =|Dlink(m)||D(m)|
P (link|m) =link(m)
freq(m)
number of Wikipedia articles where m appears as a link
number of Wikipedia articles that contain m
the number of times mention m appears as a link
total number of times mention m occurs in Wikipedia (as a link or not)
Overlapping entity mentions- Dealing with them in this phase
- E.g., by dropping a mention if it is subsumed by another mention
- Keeping them and postponing the decision to a later stage (candidate selection or disambiguation)
Candidate selectionMention detection
Candidate selection Disambiguation entity
annotationsdocument
Candidate selection- Goal: Narrow down the space of disambiguation possibilities- Balances between precision and recall (effectiveness vs.
efficiency)- Often approached as a ranking problem
- Keeping only candidates above a score/rank threshold for downstream processing
Commonness- Perform the ranking of candidate entities based on their
overall popularity, i.e., "most common sense"
P (e|m) =n(m, e)Pe0 n(m, e0)
the number of times entity e is the link destination of mention m
total number of times mention m appears as a link
Example
Home to the Empire State Building, Times Square, Statue of Liberty and other iconic sites,
New York City is a fast-paced, globally influential center of art, culture, fashion and finance.
Times_Square_(IRT_42nd_Street_Shuttle) 0.006
… …
Commonness (P(e|m))Entity (e)
0.011Times_Square_(Hong_Kong)
0.017Times_Square_(film)
Times_Square 0.940
Commonness- Can be pre-computed and stored in the entity surface form
dictionary- Follows a power law with a long tail of extremely unlikely
senses; entities at the tail end of the distribution can be safely discarded- E.g., 0.001 is a sensible threshold
Example
Entity CommonnessGermany 0.9417Germany_national_football_team 0.0139Nazi_Germany 0.0081German_Empire 0.0065...
Entity CommonnessFIFA_World_Cup 0.2358FIS_Apline_Ski_World_Cup 0.06822009_FINA_Swimming_World_Cup 0.0633World_Cup_(men's_golf) 0.0622...
Entity Commonness
1998_FIFA_World_Cup 0.9556
1998_IAAF_World_Cup 0.0296
1998_Alpine_Skiing_World_Cup 0.0059
...
Bulgaria's best World Cup performance was in the 1994 World Cup where they beat Germany, to reach the semi-finals, losing to Italy, and finishing in fourth ...
DisambiguationMention detection
Candidate selection Disambiguation entity
annotationsdocument
Disambiguation- Baseline approach: most common sense- Consider additional types of evidence
- Prior importance of entities and mentions - Contextual similarity between the text surrounding the mention and
the candidate entity - Coherence among all entity linking decisions in the document
- Combine these signals - Using supervised learning or graph-based approaches
- Optionally perform pruning - Reject low confidence or semantically meaningless annotations
Prior importance features- Context-independent features
- Neither the text nor other mentions in the document are taken into account
- Keyphraseness- Link probability- Commonness
Prior importance features- Link prior
- Popularity of the entity measured in terms of incoming links
- Page views - Popularity of the entity measured in terms traffic volume
Plink(e) =link(e)Pe0 link(e
0)
Ppageviews(e) =pageviews(e)Pe0 pageviews(e
0)
Contextual features- Compare the surrounding context of a mention with the
(textual) representation of the given candidate entity- Context of a mention
- Window of text (sentence, paragraph) around the mention - Entire document
- Entity's representation- Wikipedia entity page, first description paragraph, terms with highest
TF-IDF score, etc. - Entity's description in the knowledge base
Contextual similarity- Commonly: bag-of-words representation- Cosine similarity
- Many other options for measuring similarity- Dot product, KL divergence, Jaccard similarity
- Representation does not have to be limited to bag-of-words- Concept vectors (named entities, Wikipedia categories, anchor text,
keyphrases, etc.)
simcos
(m, e) =~dm
· ~de
k ~dm
kk ~de
k
Entity-relatedness features- It can reasonably be assumed that a document focuses on
one or at most a few topics- Therefore, entities mentioned in a document should be
topically related to each other- Capturing topical coherence by developing some measure
of relatedness between (linked) entities- Defined for pairs of entities
Wikipedia Link-based Measure (WSM)
- Often referred to simply as relatedness- A close relationship is assumed between two entities if there
is a large overlap between the entities linking to them
WLM(e, e0) = 1� log(max(|Le|, |Le0 |))� log(|Le \ Le0 |)log(|E|)� log(min(|Le|, |Le0 |))
set of entities that link to etotal number of entities
Wikipedia Link-based Measure (WLM)
Corpus-based approaches obtain background knowledge by performing statistical analysis of large untagged document collections. The most successful and well known of these techniques is Latent Semantic Analysis (Landauer et al., 1998), which relies on the tendency for related words to appear in similar contexts. LSA offers the same vocabulary as the corpus upon which it is built. Unfortunately it can only provide accurate judgments when the corpus is very large, and consequently the pre-processing effort required is significant.
Strube and Ponzetto (2006) were the first to compute measures of semantic relatedness using Wikipedia. Their approach—WikiRelate—took familiar techniques that had previously been applied to WordNet and modified them to suit Wikipedia. Their most accurate approach is based on Leacock & Chodorow’s (1998) path-length measure, which takes into account the depth within WordNet at which the concepts are found. WikiRelate’s implementation does much the same for Wikipedia’s hierarchical category structure. While the results are similar in terms of accuracy to thesaurus based techniques, the collaborative nature of Wikipedia offers a much larger—and constantly evolving—vocabulary.
Gabrilovich and Markovitch (2007) achieve extremely accurate results with ESA, a technique that is somewhat reminiscent of the vector space model widely used in information retrieval. Instead of comparing vectors of term weights to evaluate the similarity between queries and documents, they compare weighted vectors of the Wikipedia articles related to each term. The name of the approach—Explicit Semantic Analysis—stems from the way these vectors are comprised of manually defined
concepts, as opposed to the mathematically derived contexts used by Latent Semantic Analysis. The result is a measure which approaches the accuracy of humans. Additionally, it provides relatedness measures for any length of text: unlike WikiRelate, there is no restriction that the input be matched to article titles.
Obtaining Semantic Relatedness from Wikipedia Links
We have developed a new approach for extracting semantic relatedness measures from Wikipedia, which we call the Wikipedia Link-based Measure (WLM). The central difference between this and other Wikipedia based approaches is the use of Wikipedia’s hyperlink structure to define relatedness. This theoretically offers a measure that is both cheaper and more accurate than ESA: cheaper, because Wikipedia’s extensive textual content can largely be ignored, and more accurate, because it is more closely tied to the manually defined semantics of the resource.
Wikipedia’s extensive network of cross-references, portals, categories and info-boxes provide a huge amount of explicitly defined semantics. Despite the name, Explicit Semantic Analysis takes advantage of only one property: the way in which Wikipedia’s text is segmented into individual topics. It’s central component—the weight between a term and an article—is automatically derived rather than explicitly specified. In contrast, the central component of our approach is the link: a manually-defined connection between two manually disambiguated concepts. Wikipedia provides millions of these connections, as
Global WarmingAutomobile
Petrol Engine
Fossil Fuel
20th Century Emission
Standard
Bicycle
Diesel Engine
Carbon Dioxide Air
PollutionGreenhouse
Gas
Alternative Fuel
Transport
Vehicle
Henry Ford
Combustion Engine
Kyoto Protocol
Ozone
Greenhouse Effect
Planet
Audi
Battery (electricity)
Arctic Circle
Environmental Skepticism
Greenpeace
Ecology
incoming links
outgoing links
Figure 1: Obtaining a semantic relatedness measure between Automobile and Global Warming from Wikipedia links.
incoming links
outgoing links
26
Image taken from Milne and Witten (2008a). An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In AAAI WikiAI Workshop.
Asymmetric relatedness features- A relatedness function does not have to be symmetric
- E.g., the relatedness of the UNITED STATES given NEIL ARMSTRONG is intuitively larger than the relatedness of NEIL ARMSTRONG given the UNITED STATES
- Conditional probability
P (e0|e) = |Le0 \ Le||Le|
Entity-relatedness features- Numerous ways to define relatedness
- Consider not only incoming, but also outgoing links or the union of incoming and outgoing links
- Jaccard similarity, Pointwise Mutual Information (PMI), or the Chi-square statistic, etc.
- Having a single relatedness function is preferred, to keep the disambiguation process simple
- Various relatedness measures can effectively be combined into a single score using a machine learning approach [Ceccarelli et al., 2013]
Overview of features
Disambiguation approaches- Consider local compatibility (including prior evidence) and
coherence with the other entity linking decisions- Task:
- Objective function:�
⇤= argmax
�
� X
(m,e)2�
�(m, e) + (�)�
� : Md ! E[
{;}
coherence function for all entity annotations in the document
local compatibility between the mention and the assigned entity
This optimization problem is NP-hard!
Need to resort to approximation algorithms and heuristics
Disambiguation strategies- Individually, one-mention-at-a-time
- Rank candidates for each mention, take the top ranked one (or NIL) - Interdependence between entity linking decisions may be incorporated
in a pairwise fashion
- Collectively, all mentions in the document jointly
�(m) = arg max
e2Em
score(m, e)
Disambiguation approaches
Approach Context Entity interdependence
Most common sense none none
Individual local disambiguation text none
Individual global disambiguation text & entities pairwise
Collective disambiguation text & entities collective
Individual local disambiguation- Early entity linking approaches- Local compatibility score can be written as a linear
combination of features
- Learn the "optimal" combination of features from training data using machine learning
�(e,m) =X
i
�ifi(e,m)
Can be both context-independent and context-dependent features
Individual global disambiguation- Consider what other entities are mentioned in the document- True global optimization would be NP-hard- Good approximation can be computed efficiently by
considering pairwise interdependencies for each mention independently- Pairwise entity relatedness scores need to be aggregated into a single
number (how coherent the given candidate entity is with the rest of the entities in the document)
Approach #1: GLOW[Ratinov et al., 2011]
- First perform local disambiguation (E*)- The coherence of entity e with all other linked entities in the
document:
- Use the predictions in a second, global disambiguation round
gj(e, E?) = [
X
e02E?\{e}
fj(e, e0)]
aggregator function (max or avg)
particular relatedness function
score(m, e) =X
i
�ifi(e,m) +
X
j
�jgj(e, E?)
Approach #2: TAGME[Ferragina & Scaiella, 2010]
- Combine the two most important features (commonness and relatedness) using a voting scheme
- The score of a candidate entity for a particular mention:
- The vote function estimates the agreement between e and all candidate entities of all other mentions in the document
score(m, e) =X
m02Md\{m}
vote(m0, e)
TAGME (voting mechanism)- Average relatedness between each possible disambiguation,
weighted by its commonness score
vote(m0, e) =
Pe02Em0 WLM(e, e0)P (e0|m0
)
|Em0 |
entity
entity
mention mention m
entity
entity
entity e
entity
entity e’
mention m’
entity
P(e’|m’)
WLM(e,e’)
TAGME (final score)- Final decision uses a simple but robust heuristic
- The top entities with the highest score are considered for a given mention and the one with the highest commonness score is selected
�(m) = arg max
e2Em
{P (e|m) : e 2 top✏[score(m, e)]}
score(m, e) =X
m02Md\{m}
vote(m0, e)
Collective disambiguation- Graph-based representation- Mention-entity edges capture the local compatibility
between the mention and the entity- Measured using a combination of context-independent and context-
dependent features
- Entity-entity edges represent the semantic relatedness between a pair of entities- Common choice is relatedness (WLM)
- Use these relations jointly to identify a single referent entity (or none) for each of the mentions
Example
There has been several research which focused on computing the semantic relatedness between entities (Strube and Ponzetto [19]; Milne and Witten [9]). In this paper, we adopt the method proposed by Milne and Witten [9] to compute the semantic relatedness between entities, which computes the semantic relatedness as:
log(max( )) log( )( , ) 1
log( ) log(min( , ))
A B A BSR a b
W A B�
��
�ˈ
where a and b are the two entities of interest, A and B are the sets of all entities that link to a and b in Wikipedia respectively, and W is the entire Wikipedia. We show an example of semantic relatedness between four selected entities in Table 1, where the semantic relatedness can reveal the semantic relations between Michael Jordan and Space Jam, and between Michael Jordan and Chicago Bulls.
Space Jam Chicago Bulls Michael Jordan 0.66 0.82
Michael B. Jordan 0.00 0.00
Table 1. The relatedness table of four selected entities
3.3 The Referent Graph Now we have two relations: the Compatible relation between name mention and entity and the Semantic-Related relation between entities. In this way, the interdependence between the EL decisions in a document can be best represented as a graph, which we refer to as Referent Graph. Concretely, the Referent Graph is defined as follows:
A Referent Graph is a weighted graph G=(V, E), where the node set V contains all name mentions in a document and all the possible referent entities of these name mentions, with each node representing a name mention or an entity; each edge between a name mention and an entity represents a Compatible relation between them; each edge between two entities represents a Semantic-Related relation between them.
For illustration, Figure 2 shows the Referent Graph representation of the EL problem in Example 1.
Space Jam
Chicago Bulls
Bull
Michael Jordan
Michael I. Jordan
Michael B. Jordan
Space Jam
Bulls Jordan
Mention
Entity
0.66
0.82
0.13
0.01
0.20
0.12
0.03
0.08
Figure 2. The Referent Graph of Example 1
By representing both the local mention-to-entity compatibility and the global entity relation as edges, two types of dependencies are captured in Referent Graph:
1) Local Dependency between name mention and entity. In Referent Graph, the local dependency between a name mention and an entity is encoded as the edge between the nodes corresponding to them, with an edge weight indicates the strength of this dependency. For example, in Figure 2 the dependency between (Bulls, Chicago Bulls) is represented as an edge between them. Notice here the dependency between name mention and
entity is asymmetric: only the entity depends on the name mention, but the name mention does not depend on the entity.
2) Global Interdependence between EL decisions. By connecting candidate referent entities using the Semantic-Related relation, the interdependence between EL decisions is encoded as the graph structure of the Referent Graph. In this way, the referent graph allows us to deduce and use indirect and implicit dependencies, and can take the structural properties of the interdependence into consideration. For example, the name mention Bulls is related to the entity Chicago Bulls, which in turn is related successively to the entity Michael Jordan. An indirect relation between Bulls and Michael Jordan could be established and used in the EL when necessary. Such indirect relations cannot be identified using an approach based on pair-wise dependence modeling.
The Referent Graph Construction. Given a document, the construction of Referent Graph takes three steps: name mention identification, candidate entity selection and node connection. In this paper, we focus on the task of linking entities with Wikipedia, even though the proposed method can be applied to other knowledge bases. Thus we will only show how to construct the Referent Graph using Wikipedia:
1) Name Mention Identification. We first identify all name mentions in a document. Given a document, we gather all N-grams (up to 8 words) and match them to the anchor dictionary of Wikipedia (Medelyan et al. [16]). Not all matches are considered, because even stop words such as “is” and “an” may match to the anchor texts. We use Mihalcea and Csomai [2]’s keyphraseness feature to filter out the meaningless name mentions, and the retained name mentions will be represented in the graph.
2) Candidate Entity Selection. In this step, we select the candidate referent entities for each name mention detected in Step 1. We adopt the method described in Milne & Witten [14], where a name mention’s candidate referent entities are the destination Wikipedia articles of the anchor text which are the same as this name mention.
3) Node Connection. In this step, we add the dependency edge to the Referent Graph. For each name mention, we add a Compatible edge between it and each of its candidate referent entities using the method in Section 3.1. For each pairs of entities in Referent Graph, if there is a Semantic-Relation between them, we add an edge between them using the method described in Section 3.2.
4. COLLECTIVE ENTITY LINKING In this section, we propose a purely collective inference algorithm, which can jointly identify the referent entities of all name mentions in the same document. Given a Referent Graph representation, the goal of the collective EL is to use both the Compatible and the Semantic-Related relations simultaneously in EL decision making.
4.1 Problem Reformulation As shown in Section 1, the referent entity of a name mention m should be both:
y Compatible with the name mention m;
768
AIDA [Hoffart et al., 2011]- Problem formulation: find a dense subgraph that contains all
mention nodes and exactly one mention-entity edge for each mention
- Greedy algorithm iteratively removes edges
Algorithm- Start with the full graph - Iteratively remove the entity node with the lowest weighted
degree (along with all its incident edges), provided that each mention node remains connected to at least one entity- Weighted degree of an entity node is the sum of the weights of its
incident edges
- The graph with the highest density is kept as the solution- The density of the graph is measured as the minimum weighted degree
among its entity nodes
Example iteration #1
There has been several research which focused on computing the semantic relatedness between entities (Strube and Ponzetto [19]; Milne and Witten [9]). In this paper, we adopt the method proposed by Milne and Witten [9] to compute the semantic relatedness between entities, which computes the semantic relatedness as:
log(max( )) log( )( , ) 1
log( ) log(min( , ))
A B A BSR a b
W A B�
��
�ˈ
where a and b are the two entities of interest, A and B are the sets of all entities that link to a and b in Wikipedia respectively, and W is the entire Wikipedia. We show an example of semantic relatedness between four selected entities in Table 1, where the semantic relatedness can reveal the semantic relations between Michael Jordan and Space Jam, and between Michael Jordan and Chicago Bulls.
Space Jam Chicago Bulls Michael Jordan 0.66 0.82
Michael B. Jordan 0.00 0.00
Table 1. The relatedness table of four selected entities
3.3 The Referent Graph Now we have two relations: the Compatible relation between name mention and entity and the Semantic-Related relation between entities. In this way, the interdependence between the EL decisions in a document can be best represented as a graph, which we refer to as Referent Graph. Concretely, the Referent Graph is defined as follows:
A Referent Graph is a weighted graph G=(V, E), where the node set V contains all name mentions in a document and all the possible referent entities of these name mentions, with each node representing a name mention or an entity; each edge between a name mention and an entity represents a Compatible relation between them; each edge between two entities represents a Semantic-Related relation between them.
For illustration, Figure 2 shows the Referent Graph representation of the EL problem in Example 1.
Space Jam
Chicago Bulls
Bull
Michael Jordan
Michael I. Jordan
Michael B. Jordan
Space Jam
Bulls Jordan
Mention
Entity
0.66
0.82
0.13
0.01
0.20
0.12
0.03
0.08
Figure 2. The Referent Graph of Example 1
By representing both the local mention-to-entity compatibility and the global entity relation as edges, two types of dependencies are captured in Referent Graph:
1) Local Dependency between name mention and entity. In Referent Graph, the local dependency between a name mention and an entity is encoded as the edge between the nodes corresponding to them, with an edge weight indicates the strength of this dependency. For example, in Figure 2 the dependency between (Bulls, Chicago Bulls) is represented as an edge between them. Notice here the dependency between name mention and
entity is asymmetric: only the entity depends on the name mention, but the name mention does not depend on the entity.
2) Global Interdependence between EL decisions. By connecting candidate referent entities using the Semantic-Related relation, the interdependence between EL decisions is encoded as the graph structure of the Referent Graph. In this way, the referent graph allows us to deduce and use indirect and implicit dependencies, and can take the structural properties of the interdependence into consideration. For example, the name mention Bulls is related to the entity Chicago Bulls, which in turn is related successively to the entity Michael Jordan. An indirect relation between Bulls and Michael Jordan could be established and used in the EL when necessary. Such indirect relations cannot be identified using an approach based on pair-wise dependence modeling.
The Referent Graph Construction. Given a document, the construction of Referent Graph takes three steps: name mention identification, candidate entity selection and node connection. In this paper, we focus on the task of linking entities with Wikipedia, even though the proposed method can be applied to other knowledge bases. Thus we will only show how to construct the Referent Graph using Wikipedia:
1) Name Mention Identification. We first identify all name mentions in a document. Given a document, we gather all N-grams (up to 8 words) and match them to the anchor dictionary of Wikipedia (Medelyan et al. [16]). Not all matches are considered, because even stop words such as “is” and “an” may match to the anchor texts. We use Mihalcea and Csomai [2]’s keyphraseness feature to filter out the meaningless name mentions, and the retained name mentions will be represented in the graph.
2) Candidate Entity Selection. In this step, we select the candidate referent entities for each name mention detected in Step 1. We adopt the method described in Milne & Witten [14], where a name mention’s candidate referent entities are the destination Wikipedia articles of the anchor text which are the same as this name mention.
3) Node Connection. In this step, we add the dependency edge to the Referent Graph. For each name mention, we add a Compatible edge between it and each of its candidate referent entities using the method in Section 3.1. For each pairs of entities in Referent Graph, if there is a Semantic-Relation between them, we add an edge between them using the method described in Section 3.2.
4. COLLECTIVE ENTITY LINKING In this section, we propose a purely collective inference algorithm, which can jointly identify the referent entities of all name mentions in the same document. Given a Referent Graph representation, the goal of the collective EL is to use both the Compatible and the Semantic-Related relations simultaneously in EL decision making.
4.1 Problem Reformulation As shown in Section 1, the referent entity of a name mention m should be both:
y Compatible with the name mention m;
768
0.01
weighted degree
Which entity should be removed?
0.95
0.86 0.03
1.56
0.12xxWhat is the density of the graph? 0.03
Example iteration #2
There has been several research which focused on computing the semantic relatedness between entities (Strube and Ponzetto [19]; Milne and Witten [9]). In this paper, we adopt the method proposed by Milne and Witten [9] to compute the semantic relatedness between entities, which computes the semantic relatedness as:
log(max( )) log( )( , ) 1
log( ) log(min( , ))
A B A BSR a b
W A B�
��
�ˈ
where a and b are the two entities of interest, A and B are the sets of all entities that link to a and b in Wikipedia respectively, and W is the entire Wikipedia. We show an example of semantic relatedness between four selected entities in Table 1, where the semantic relatedness can reveal the semantic relations between Michael Jordan and Space Jam, and between Michael Jordan and Chicago Bulls.
Space Jam Chicago Bulls Michael Jordan 0.66 0.82
Michael B. Jordan 0.00 0.00
Table 1. The relatedness table of four selected entities
3.3 The Referent Graph Now we have two relations: the Compatible relation between name mention and entity and the Semantic-Related relation between entities. In this way, the interdependence between the EL decisions in a document can be best represented as a graph, which we refer to as Referent Graph. Concretely, the Referent Graph is defined as follows:
A Referent Graph is a weighted graph G=(V, E), where the node set V contains all name mentions in a document and all the possible referent entities of these name mentions, with each node representing a name mention or an entity; each edge between a name mention and an entity represents a Compatible relation between them; each edge between two entities represents a Semantic-Related relation between them.
For illustration, Figure 2 shows the Referent Graph representation of the EL problem in Example 1.
Space Jam
Chicago Bulls
Bull
Michael Jordan
Michael I. Jordan
Michael B. Jordan
Space Jam
Bulls Jordan
Mention
Entity
0.66
0.82
0.13
0.01
0.20
0.12
0.03
0.08
Figure 2. The Referent Graph of Example 1
By representing both the local mention-to-entity compatibility and the global entity relation as edges, two types of dependencies are captured in Referent Graph:
1) Local Dependency between name mention and entity. In Referent Graph, the local dependency between a name mention and an entity is encoded as the edge between the nodes corresponding to them, with an edge weight indicates the strength of this dependency. For example, in Figure 2 the dependency between (Bulls, Chicago Bulls) is represented as an edge between them. Notice here the dependency between name mention and
entity is asymmetric: only the entity depends on the name mention, but the name mention does not depend on the entity.
2) Global Interdependence between EL decisions. By connecting candidate referent entities using the Semantic-Related relation, the interdependence between EL decisions is encoded as the graph structure of the Referent Graph. In this way, the referent graph allows us to deduce and use indirect and implicit dependencies, and can take the structural properties of the interdependence into consideration. For example, the name mention Bulls is related to the entity Chicago Bulls, which in turn is related successively to the entity Michael Jordan. An indirect relation between Bulls and Michael Jordan could be established and used in the EL when necessary. Such indirect relations cannot be identified using an approach based on pair-wise dependence modeling.
The Referent Graph Construction. Given a document, the construction of Referent Graph takes three steps: name mention identification, candidate entity selection and node connection. In this paper, we focus on the task of linking entities with Wikipedia, even though the proposed method can be applied to other knowledge bases. Thus we will only show how to construct the Referent Graph using Wikipedia:
1) Name Mention Identification. We first identify all name mentions in a document. Given a document, we gather all N-grams (up to 8 words) and match them to the anchor dictionary of Wikipedia (Medelyan et al. [16]). Not all matches are considered, because even stop words such as “is” and “an” may match to the anchor texts. We use Mihalcea and Csomai [2]’s keyphraseness feature to filter out the meaningless name mentions, and the retained name mentions will be represented in the graph.
2) Candidate Entity Selection. In this step, we select the candidate referent entities for each name mention detected in Step 1. We adopt the method described in Milne & Witten [14], where a name mention’s candidate referent entities are the destination Wikipedia articles of the anchor text which are the same as this name mention.
3) Node Connection. In this step, we add the dependency edge to the Referent Graph. For each name mention, we add a Compatible edge between it and each of its candidate referent entities using the method in Section 3.1. For each pairs of entities in Referent Graph, if there is a Semantic-Relation between them, we add an edge between them using the method described in Section 3.2.
4. COLLECTIVE ENTITY LINKING In this section, we propose a purely collective inference algorithm, which can jointly identify the referent entities of all name mentions in the same document. Given a Referent Graph representation, the goal of the collective EL is to use both the Compatible and the Semantic-Related relations simultaneously in EL decision making.
4.1 Problem Reformulation As shown in Section 1, the referent entity of a name mention m should be both:
y Compatible with the name mention m;
768
weighted degree
Which entity should be removed?
0.95
0.86 0.03
1.56
0.12xxx x
What is the density of the graph? 0.12
Example iteration #3
There has been several research which focused on computing the semantic relatedness between entities (Strube and Ponzetto [19]; Milne and Witten [9]). In this paper, we adopt the method proposed by Milne and Witten [9] to compute the semantic relatedness between entities, which computes the semantic relatedness as:
log(max( )) log( )( , ) 1
log( ) log(min( , ))
A B A BSR a b
W A B�
��
�ˈ
where a and b are the two entities of interest, A and B are the sets of all entities that link to a and b in Wikipedia respectively, and W is the entire Wikipedia. We show an example of semantic relatedness between four selected entities in Table 1, where the semantic relatedness can reveal the semantic relations between Michael Jordan and Space Jam, and between Michael Jordan and Chicago Bulls.
Space Jam Chicago Bulls Michael Jordan 0.66 0.82
Michael B. Jordan 0.00 0.00
Table 1. The relatedness table of four selected entities
3.3 The Referent Graph Now we have two relations: the Compatible relation between name mention and entity and the Semantic-Related relation between entities. In this way, the interdependence between the EL decisions in a document can be best represented as a graph, which we refer to as Referent Graph. Concretely, the Referent Graph is defined as follows:
A Referent Graph is a weighted graph G=(V, E), where the node set V contains all name mentions in a document and all the possible referent entities of these name mentions, with each node representing a name mention or an entity; each edge between a name mention and an entity represents a Compatible relation between them; each edge between two entities represents a Semantic-Related relation between them.
For illustration, Figure 2 shows the Referent Graph representation of the EL problem in Example 1.
Space Jam
Chicago Bulls
Bull
Michael Jordan
Michael I. Jordan
Michael B. Jordan
Space Jam
Bulls Jordan
Mention
Entity
0.66
0.82
0.13
0.01
0.20
0.12
0.03
0.08
Figure 2. The Referent Graph of Example 1
By representing both the local mention-to-entity compatibility and the global entity relation as edges, two types of dependencies are captured in Referent Graph:
1) Local Dependency between name mention and entity. In Referent Graph, the local dependency between a name mention and an entity is encoded as the edge between the nodes corresponding to them, with an edge weight indicates the strength of this dependency. For example, in Figure 2 the dependency between (Bulls, Chicago Bulls) is represented as an edge between them. Notice here the dependency between name mention and
entity is asymmetric: only the entity depends on the name mention, but the name mention does not depend on the entity.
2) Global Interdependence between EL decisions. By connecting candidate referent entities using the Semantic-Related relation, the interdependence between EL decisions is encoded as the graph structure of the Referent Graph. In this way, the referent graph allows us to deduce and use indirect and implicit dependencies, and can take the structural properties of the interdependence into consideration. For example, the name mention Bulls is related to the entity Chicago Bulls, which in turn is related successively to the entity Michael Jordan. An indirect relation between Bulls and Michael Jordan could be established and used in the EL when necessary. Such indirect relations cannot be identified using an approach based on pair-wise dependence modeling.
The Referent Graph Construction. Given a document, the construction of Referent Graph takes three steps: name mention identification, candidate entity selection and node connection. In this paper, we focus on the task of linking entities with Wikipedia, even though the proposed method can be applied to other knowledge bases. Thus we will only show how to construct the Referent Graph using Wikipedia:
1) Name Mention Identification. We first identify all name mentions in a document. Given a document, we gather all N-grams (up to 8 words) and match them to the anchor dictionary of Wikipedia (Medelyan et al. [16]). Not all matches are considered, because even stop words such as “is” and “an” may match to the anchor texts. We use Mihalcea and Csomai [2]’s keyphraseness feature to filter out the meaningless name mentions, and the retained name mentions will be represented in the graph.
2) Candidate Entity Selection. In this step, we select the candidate referent entities for each name mention detected in Step 1. We adopt the method described in Milne & Witten [14], where a name mention’s candidate referent entities are the destination Wikipedia articles of the anchor text which are the same as this name mention.
3) Node Connection. In this step, we add the dependency edge to the Referent Graph. For each name mention, we add a Compatible edge between it and each of its candidate referent entities using the method in Section 3.1. For each pairs of entities in Referent Graph, if there is a Semantic-Relation between them, we add an edge between them using the method described in Section 3.2.
4. COLLECTIVE ENTITY LINKING In this section, we propose a purely collective inference algorithm, which can jointly identify the referent entities of all name mentions in the same document. Given a Referent Graph representation, the goal of the collective EL is to use both the Compatible and the Semantic-Related relations simultaneously in EL decision making.
4.1 Problem Reformulation As shown in Section 1, the referent entity of a name mention m should be both:
y Compatible with the name mention m;
768
weighted degree
Which entity should be removed?
0.95
0.86 0.03
1.56
0.12xxx x
x x
What is the density of the graph? 0.86
Pre- and post-processing- Pre-processing phase: remove entities that are "too distant"
from the mention nodes- At the end of the iterations, the solution graph may still
contain mentions that are connected to more than one entity; deal with this in post-processing- If the graph is sufficiently small, it is feasible to exhaustively consider all
possible mention-entity pairs - Otherwise, a faster local (hill-climbing) search algorithm may be used
Random walk with restarts [Han et al., 2011]
- "A random walk with restart is a stochastic process to traverse a graph, resulting in a probability distribution over the vertices corresponding to the likelihood those vertices are visited" [Guo et al., 2014]
- Starting vector holding initial evidence
v(m) =tfidf(m)P
m0inMdtfidf(m0)
Random walk process- T is evidence propagation matrix
- Evidence propagation ratio
- Evidence propagation
T (m ! e) =w(m ! e)P
e02Emw(m ! e0)
T (e ! e0) =w(e ! e0)P
e002Edw(e ! e00)
r0 = vrt+1 = (1� ↵)⇥ rt ⇥ T + ↵⇥ v
vector holding the probability that the nodes are visited
restart probability (0.1)
initial evidence
Final decision- Once the random walk process has converged to a stationary
distribution, the referent entity for each mention
�(m) = arg max
e2Em
�(m, e)⇥ r(e)
local compatibility
�(e,m) =X
i
�ifi(e,m)
Pruning- Discarding meaningless or low-confidence annotations
produced by the disambiguation phase- Simplest solution: use a confidence threshold- More advanced solutions
- Machine learned classifier to retain only entities that are ``relevant enough'' (human editor would annotate them) [Milne & Witten, 2008]
- Optimization problem: decide, for each mention, whether switching the top ranked disambiguation to NIL would improve the objective function [Ratinov et al., 2011]
Entity linking systems
Publicly available entity linking systems
System Reference KB
Online demo
Web API
Source code
AIDAhttp://www.mpi-inf.mpg.de/yago-naga/aida/ YAGO2 Yes Yes Yes (Java)
DBpedia Spotlighthttp://spotlight.dbpedia.org/ DBpedia Yes Yes Yes (Java)
Illinois Wikifierhttp://cogcomp.cs.illinois.edu/page/download_view/Wikifier Wikipedia No No Yes (Java)
TAGMEhttps://tagme.d4science.org/tagme/ Wikipedia Yes Yes Yes (Java)
Wikipedia Minerhttps://github.com/dnmilne/wikipediaminer Wikipedia No No Yes (Java)
Publicly available entity linking systems
- AIDA [Hoffart et al., 2011] - Collective disambiguation using a graph-based approach (see under
Disambiguation section)
- DBpedia Spotlight [Mendes et al., 2011]- Straightforward local disambiguation approach
- Vector space representations of entities are compared against the paragraphs of their mentions using the cosine similarity;
- Introduce Inverse Candidate Frequency (ICF) instead of using IDF - Annotate with DBpedia entities, which can be restricted to certain
types or even to a custom entity set defined by a SPARQL query
Publicly available entity linking systems
- Illinois Wikifier (a.k.a. GLOW) [Ratinov et al., 2011]- Individual global disambiguation (see under Disambiguation section)
- TAGME [Ferragina & Scaiella, 2011]- One of the most popular entity linking systems - Designed specifically for efficient annotation of short texts, but works
well on long texts too - Individual global disambiguation (see under Disambiguation section)
- Wikipedia Miner [Milne & Witten, 2008]- Seminal entity linking system that was first to ...
- combine commonness and relatedness for (local) disambiguation - have open-source implementation and be offered as a web service
Evaluation
Evaluation (end-to-end)- Comparing the system-generated annotations against a
human-annotated gold standard- Evaluation criteria
- Perfect match: both the linked entity and the mention offsets must match
- Relaxed match: the linked entity must match, it is sufficient if the mention overlaps with the gold standard
Evaluation with relaxed matchExample #1 Example #2
ground truth system annotation ground truth system annotation
Evaluation metrics- Set-based metrics:
- Precision: fraction of correctly linked entities that have been annotated by the system
- Recall: fraction of correctly linked entities that should be annotated - F-measure: harmonic mean of precision and recall
- Metrics are computed over a collection of documents- Micro-averaged: aggregated across mentions - Macro-averaged: aggregated across documents
Evaluation metrics- Micro-averaged
- Macro-averaged
- F1 score
Pmic =|AD \ AD|
|AD|Rmic =
|AD \ AD||AD|
ground truth annotations
annotations generated by the entity linking system
Pmac =X
d2D
|Ad \ Ad||Ad|
/|D| Rmac =X
d2D
|Ad \ Ad||Ad|
/|D|
F1 =2⇥ P⇥ R
P+ R
Test collections by individual researchers
System Reference KB Documents #Docs Annots. #Mentions All? NIL
MSNBC [Cucerzan, 2007] Wikipedia News 20 entities 755 Yes Yes
AQUAINT [Milne & Witten, 2008] Wikipedia News 50 entities &
concepts 727 No No
IITB [Kulkarni et al, 2009] Wikipedia Web pages 107 entities 17,200 Yes Yes
ACE2004 [Ratinov et al., 2011] Wikipedia News 57 entities 306 Yes Yes
CoNLL-YAGO [Hoffart et al., 2011] YAGO2 News 1,393 entites 34,956 Yes Yes
Evaluation campaigs- INEX Link-the-Wiki- TAC Entity Linking- Entity Recognition and Disambiguation (ERD) Challenge
Component-based evaluation- The pipeline architecture makes the evaluation of entity
linking systems especially challenging- The main focus is on the disambiguation component, but its
performance is largely influenced by the preceding steps
- Fair comparison between two approaches can only be made if they share all other elements of the pipeline
Meta evaluations- [Hatchey et al., 2009]
- Reimplements three state-of-the-art systems
- DEXTER [Ceccarelli et al., 2013]- Reimplement TAGME, Wikipedia Miner, and collective EL [Han et al., 2011]
- BAT-Framework [Cornolti et al., 2013]- Comparing publicly available entity annotation systems (AIDA, Illinois
Wikifier, TAGME, Wikipedia Miner, DBpedia Spotlight) - Number of test collections (news, web pages, and tweets)
- GEBRIL [Usbeck et al., 2015]- Extends the BAT-Framework by being able to link to any knowledge base - Benchmarking any annotation service through a REST interface - Provides persistent URLs for experimental settings (for reproducibility)
Resources
A Cross-Lingual Dictionary for English Wikipedia Concepts- Collecting strings (mentions) that link to Wikipedia articles on
the Web- https://research.googleblog.com/2012/05/from-words-to-concepts-
and-back.html
number of times the mention is linked to that articlestring (mention) Wikipedia article
Freebase Annotations of the ClueWeb Corpora- ClueWeb annotated with Freebase entities (by Google)
- http://lemurproject.org/clueweb09/FACC1/ - http://lemurproject.org/clueweb12/FACC1/
name of the document that was annotated
entity mention beginning and end byte offsets
entity ID in Freebaseconfidence given both the mention and the context
confidence given just the context (ignoring the mention)
Entity linking in queries
Entity linking in queries- Challenges
- search queries are short - limited context - lack of proper grammar, spelling - multiple interpretations - needs to be fast
Examplethe governator
movieperson
Examplethe governator movie
Examplenew york pizza manhattan
Examplenew york pizza manhattan
ERD’14 challenge- Task: finding query interpretations- Input: keyword query- Output: sets of sets of entities- Reference KB: Freebase- Annotations are to be performed by a web service within a
given time limit
Evaluation
P =|ITI|
|I|R =
|ITI|
|I|F =
2 · P ·RP +R
New York City, Manhattan
ground truth system annotationI I
new york pizza manhattan
New York-style pizza, Manhattan
New York City, Manhattan
New York-style pizza
ERD’14 results
Single interpretation is returned
Rank Team F1 latency
1 SMAPH Team 0.7076 0.49
2 NTUNLP 0.6797 1.04
3 Seznam Research 0.6693 3.91
http://web-ngram.research.microsoft.com/erd2014/LeaderBoard.aspx
Hands-on part http://bit.ly/russir2016-elx
Questions?
Contact | @krisztianbalog | krisztianbalog.com