69
Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel S. Weld

Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Design Challenges for Entity Linking

Xiao Ling, Sameer Singh, Daniel S. Weld

Page 2: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Entity Linking

2

Seattle beat Portland yesterday.

Page 3: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Entity Linking

3

Seattle beat Portland yesterday.

Page 4: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Entity Linking

4

Seattle beat Portland yesterday.

Seattle (city)

Seattle Sounders

Sea-Tac(airport)

Page 5: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Entity Linking

5

Seattle beat Portland yesterday.

Seattle (city)

Seattle Sounders

Sea-Tac(airport) ~3-4 M

entries

Page 6: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Applications• Relation Extraction

(e.g. Koch et al. 2014)

• Coreference Resolution (e.g. Hajishirzi et al. 2013, Durrett & Klein 2014)

• Question Answering (e.g. Sun et al. 2015)

• Web Search (e.g. Knowledge Graph)

• many others… (see Shen et al. 2014; Roth et al. 2014)

6

Page 7: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Ambiguity

• Seattle beat Portland yesterday.

• Seattle scores high in the latest report of startup hubs.

• The Emerald City Council To Make Decision on Antibiotic Resolution

7

Page 8: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Ambiguity

• Seattle beat Portland yesterday.

• Seattle scores high in the latest report of startup hubs.

• The Emerald City Council To Make Decision on Antibiotic Resolution

Seattle Sounders

8

Page 9: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Ambiguity

• Seattle beat Portland yesterday.

• Seattle scores high in the latest report of startup hubs.

• The Emerald City Council To Make Decision on Antibiotic Resolution

Seattle Sounders

Seattle (city)

9

Page 10: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Variability• Seattle scores high in the latest report of startup

hubs.

• The Emerald City Council To Make Decision on Antibiotic Resolution

10

Page 11: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Variability• Seattle scores high in the latest report of startup

hubs.

• The Emerald City Council To Make Decision on Antibiotic Resolution

Seattle (city)

11

Page 12: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Related Work• Cucerzan (2007)

• Milne and Witten (2008)

• Kulkarni et al. (2009)

• Ratinov et al. (2011)

• Hoffart et al. (2011)

• Han and Sun (2012)

• He et al. (2013a)12

• He et al. (2013b)

• Cheng and Roth (2013)

• Sil and Yates (2013)

• Li et al. (2013)

• Cornolti et al. (2013)

• … many others

Page 13: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Related Work• Cucerzan (2007)

• Milne and Witten (2008)

• Kulkarni et al. (2009)

• Ratinov et al. (2011)

• Hoffart et al. (2011)

• Han and Sun (2012)

• He et al. (2013a)13

• He et al. (2013b)

• Cheng and Roth (2013)

• Sil and Yates (2013)

• Li et al. (2013)

• Cornolti et al. (2013)

• … many others

Joint Inference

Page 14: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Related Work• Cucerzan (2007)

• Milne and Witten (2008)

• Kulkarni et al. (2009)

• Ratinov et al. (2011)

• Hoffart et al. (2011)

• Han and Sun (2012)

• He et al. (2013a)14

• He et al. (2013b)

• Cheng and Roth (2013)

• Sil and Yates (2013)

• Li et al. (2013)

• Cornolti et al. (2013)

• … many others

Joint Inference

Learning to rank

Page 15: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Related Work• Cucerzan (2007)

• Milne and Witten (2008)

• Kulkarni et al. (2009)

• Ratinov et al. (2011)

• Hoffart et al. (2011)

• Han and Sun (2012)

• He et al. (2013a)15

• He et al. (2013b)

• Cheng and Roth (2013)

• Sil and Yates (2013)

• Li et al. (2013)

• Cornolti et al. (2013)

• … many others

Joint Inference

Deep Neural Networks

Learning to rank

Page 16: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Popular Data Sets

16

Dataset

# of Mentions Knowledge Base

UIUCACE 244 Wikipedia

MSNBC 654 WikipediaAIDA

(Hoffart et al. 2011)

AIDA-D 5917 YagoAIDA-T 5616 Yago

TAC KBP

TAC09 3904 Wikipedia 2008TAC10 2250 Wikipedia 2008TAC10T 1500 Wikipedia 2008TAC11 2250 Wikipedia 2008TAC12 2226 Wikipedia 2008

Page 17: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Unfortunately…

17

ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12

Cucerzan (2007) ⎷Milne & Witten (2008)Kulkarni et al. (2009) ⎷Ratinov et al. (2011) ⎷ ⎷Hoffart et al. (2011) ⎷Han & Sun (2012) ⎷He et al. (2013a) ⎷ ⎷He et al. (2013b) ⎷ ⎷

Cheng & Roth (2013) ⎷ ⎷ ⎷Sil & Yates (2013) ⎷ ⎷ ⎷

Li et al. (2013) ⎷ ⎷Cornolti et al. (2013) ⎷ ⎷TAC-KBP participants ⎷ ⎷ ⎷ ⎷ ⎷

Page 18: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Unfortunately…

18

ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12

Cucerzan (2007) ⎷Milne & Witten (2008)Kulkarni et al. (2009) ⎷Ratinov et al. (2011) ⎷ ⎷Hoffart et al. (2011) ⎷Han & Sun (2012) ⎷He et al. (2013a) ⎷ ⎷He et al. (2013b) ⎷ ⎷

Cheng & Roth (2013) ⎷ ⎷ ⎷Sil & Yates (2013) ⎷ ⎷ ⎷

Li et al. (2013) ⎷ ⎷Cornolti et al. (2013) ⎷ ⎷TAC-KBP participants ⎷ ⎷ ⎷ ⎷ ⎷

Joint Inference

Deep Neural Networks

Learning to rank

Page 19: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Metonymy

19

ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12

Cucerzan (2007) ⎷Milne & Witten (2008)Kulkarni et al. (2009) ⎷Ratinov et al. (2011) ⎷ ⎷Hoffart et al. (2011) ⎷Han & Sun (2012) ⎷He et al. (2013a) ⎷ ⎷He et al. (2013b) ⎷ ⎷

Cheng & Roth (2013) ⎷ ⎷ ⎷Sil & Yates (2013) ⎷ ⎷ ⎷

Li et al. (2013) ⎷ ⎷Cornolti et al. (2013) ⎷ ⎷TAC-KBP participants ⎷ ⎷ ⎷ ⎷ ⎷

… Moscow ’s as yet undisclosed

proposals …

Moscow (city)

Russia (country)

Government of Russia

Page 20: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Nested Entities

20

ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12

Cucerzan (2007) ⎷Milne & Witten (2008)Kulkarni et al. (2009) ⎷Ratinov et al. (2011) ⎷ ⎷Hoffart et al. (2011) ⎷Han & Sun (2012) ⎷He et al. (2013a) ⎷ ⎷He et al. (2013b) ⎷ ⎷

Cheng & Roth (2013) ⎷ ⎷ ⎷Sil & Yates (2013) ⎷ ⎷ ⎷

Li et al. (2013) ⎷ ⎷Cornolti et al. (2013) ⎷ ⎷TAC-KBP participants ⎷ ⎷ ⎷ ⎷ ⎷

… Florida Green Party …

Green Party of the US

Green Party of Florida

Page 21: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Contributions• Vinculum: a simple, deterministic, modular EL sys.

• comprehensive evaluation over nine data sets

• candidate conditional prob. can work quite well

• entity types are important to the final performance

• comparable results with two state-of-the-art sys.

21

Page 22: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Agenda

• Introduction

• Vinculum

• Experiments

• Conclusion

22

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 23: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Vinculum ArchitectureSeattle beat Portland yesterday.

23

Input:

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 24: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Mention ExtractionSeattle beat Portland yesterday.

24

Mention Extraction

Page 25: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Candidate GenerationSeattle beat Portland yesterday.

25

Candidate Entities

- Seattle (city)

- Seattle Sounders

- Seattle-Tacoma (airport)

Candidate Generation

Mention Extraction

Page 26: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Conditional probability

26

… capital of the state of Washington .

In 1990, Washington starred as Bleek Gilliam …

Washington refused to run for a third term …

… Washington …

p(e | m) = # [m -> e]

# m

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 27: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

27

… capital of the state of Washington .

In 1990, Washington starred as Bleek Gilliam …

Washington refused to run for a third term …

… Washington …

p( | “Washington”) = # “W” ->

# “W”

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Conditional probability

Page 28: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Candidate GenerationSeattle beat Portland yesterday.

28

Candidate Entities

- Seattle (city)

- Seattle Sounders

- Seattle-Tacoma (airport)

Candidate Generation

Mention Extraction

Page 29: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Candidate GenerationSeattle beat Portland yesterday.

29

Candidate Entities

- Seattle (city)

- Seattle Sounders

- Seattle-Tacoma (airport)

0.6

0.2

0.1

Candidate Generation

Mention Extraction

Page 30: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Entity TypesSeattle beat Portland yesterday.

30

Candidate Entities

- Seattle (city)

- Seattle Sounders

- Seattle-Tacoma (airport)

0.6

0.2

0.1

Entity Type Prediction• city • sports_team • facility/airport

0.10.4

0.1

Entity Type

Candidate Generation

Mention Extraction

Page 31: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Entity TypesSeattle beat Portland yesterday.

31

Candidate Entities

- Seattle (city)

- Seattle Sounders

- Seattle-Tacoma (airport)

0.6

0.2

0.1

Entity Type Prediction• city • sports_team • facility/airport

0.10.4

0.1

p(e | m) = ∑t p(e,t | m) p(e | m) = ∑t p(e | t,m) p(t | m)

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 32: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Entity TypesSeattle beat Portland yesterday.

32

Candidate Entities

- Seattle (city)

- Seattle Sounders

- Seattle-Tacoma (airport)

0.6

0.2

0.1

p(e | m) = ∑t p(e,t | m) p(e | m) = ∑t p(e | t,m) p(t | m)

p(e | t,m) : re-normalization of cond. prob.

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 33: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Entity TypesSeattle beat Portland yesterday.

33

Candidate Entities

- Seattle (city)

- Seattle Sounders

- Seattle-Tacoma (airport)

0.6

0.2

0.1

p(e | m) = ∑t p(e,t | m) p(e | m) = ∑t p(e | t,m) p(t | m)

p(e | t,m) : re-normalization of cond. prob. e.g. t = LOC p(Seattle-city | LOC, “Seattle”) = 0.6 / 0.7 p(Sea-Tac | LOC, “Seattle”) = 0.1 / 0.7

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 34: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Entity TypesSeattle beat Portland yesterday.

34

Candidate Entities

- Seattle (city)

- Seattle Sounders

- Seattle-Tacoma (airport)

0.6

0.2

0.1

Entity Type Prediction• city • sports_team • facility/airport

0.10.4

0.1

p(e | m) = ∑t p(e,t | m) p(e | m) = ∑t p(e,t | m) p(t | m)

p(t | m)

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 35: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Entity TypesSeattle beat Portland yesterday.

35

Entity Type Prediction

• city • sports_team • facility/airport

0.10.4

0.1

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Candidate Entities

- Seattle (city)

- Seattle Sounders

- Seattle-Tacoma (airport)

0.6

0.2

0.1

Candidate Generation Candidate Entities

- Seattle (city)

- Seattle Sounders

- Seattle-Tacoma (airport)

0.2

0.4

0.1

Entity Type

Page 36: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Entity TypesSeattle beat Portland yesterday.

36

Entity Type Prediction

• city • sports_team • facility/airport

0.10.4

0.1

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Candidate Entities

- Seattle (city)

- Seattle Sounders

- Seattle-Tacoma (airport)

0.6

0.2

0.1

Candidate Generation Candidate Entities

- Seattle Sounders

- Seattle (city)

- Seattle-Tacoma (airport)

0.2

0.4

0.1

Entity Type

Page 37: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Coreference

37

Seattle Sounders head coach Sigi Schmid has some ideas … Seattle beat Portland yesterday.

Entity Type

Candidate Generation

Coreference

Mention Extraction

Page 38: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Coherence

38

Candidate Entities

- Seattle (city)

- Seattle Sounders

- Seattle-Tacoma (airport)

Candidate Entities

- Portland, OR

- Univ. of Portland

- Portland Timbers

Seattle beat Portland yesterday.

0.2

0.4

0.1

0.2

0.2

0.1

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 39: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Normalized Google Distance (NGD)

39

(Milne & Witten, 2008)Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 40: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Normalized Google Distance (NGD)

40

(Milne & Witten, 2008)

George Washington

Denzel Washington

President of the US

US Constitution

American Revolutionary

War

Tony Award

Golden Globe Training Day

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 41: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Normalized Google Distance (NGD)

41

(Milne & Witten, 2008)

George Washington

Denzel Washington

President of the US

US Constitution

American Revolutionary

War

Tony Award

Golden Globe Training Day

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 42: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Normalized Google Distance (NGD)

42

(Milne & Witten, 2008)

George Washington John Adams

President of the US

US Constitution

American Revolutionary

WarQuasi-WarPresident

of the USUS

Constitution

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 43: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Relational Score

• Relation triples from Freebase

• A binary score =

• 1, if two entities appear in a triple

• 0, otherwise

• E.g. (Barack Obama, birthplace, United States) => r (Barack Obama, United States) = 1

43

(Cheng & Roth, 2013)Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 44: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Context

44

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 45: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Agenda

• Introduction

• Vinculum

• Experiments

• Conclusion

45

ACE MSNBC AIDA-D AIDA-T KBP09 KBP10

KBP10T

KBP11

KBP12Cucerzan (2007) ⎷

Milne & Witten (2008)Kulkarni et al. (2009) ⎷

Ratinov et al. (2011) ⎷ ⎷

Hoffart et al. (2011) ⎷

Han & Sun (2012) ⎷

He et al. (2013a) ⎷ ⎷

He et al. (2013b) ⎷ ⎷

Cheng & Roth (2013) ⎷ ⎷ ⎷

Sil & Yates (2013) ⎷ ⎷ ⎷

Li et al. (2013) ⎷ ⎷

Cornolti et al. (2013) ⎷ ⎷

TAC-KBP participants ⎷ ⎷ ⎷ ⎷ ⎷

Page 46: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Data Sets

46

Dataset # of Mentions Knowledge BaseACE 244 Wikipedia

MSNBC 654 WikipediaAIDA-D 5917 YagoAIDA-T 5616 YagoTAC09 3904 Wikipedia 2008TAC10 2250 Wikipedia 2008TAC10T 1500 Wikipedia 2008TAC11 2250 Wikipedia 2008TAC12 2226 Wikipedia 2008

Mention based F1

Official Eval.

Page 47: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Candidate Generation

• intra-Wikipedia

• CrossWikis(Spitkovsky & Chang, 2012)

• Freebase Search API

47

Entity Type

Candidate Generation

Coreference

Coherence

Mention ExtractionConditional Probability p(e | m) e.g. p( | “Washington”)

Page 48: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Aggregate Recall

48

1 3 5 10 20 30 50 100 Inf0.5

0.6

0.7

0.8

0.9

1

k

Recall@k

CrossWikisIntra−WikipediaFreebase Search

Page 49: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

49

• Coarse-grained NER(Stanford NER)

• Fine-grained Entity Types (Ling & Weld, 2012) Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Effect of Entity Types

p(e | m) = ∑t p(e,t | m) p(e | m) = ∑t p(e,t | m) p(t | m)

Entity Type Probability

Page 50: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

FIGER

• 112 entity types

• multi-label multi-class

50

(Ling & Weld, 2012)

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 51: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Predicted Entity Types

51

F1

50

62.5

75

87.5

100

ACE MSNBC AIDA-D AIDA-T TAC09 TAC10 TAC10T TAC11 TAC12 Overall+NER +FIGER

Page 52: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Overall Performance

52

F1

50

62.5

75

87.5

100

ACE MSNBC AIDA-D AIDA-T TAC09 TAC10 TAC10T TAC11 TAC12Candidate Generation +Entity Types +Coref. +Coherence

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 53: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Overall Performance

53

F1

50

62.5

75

87.5

100

AverageCand +Entity Type +Coref +Coherence

79.078.076.775.0

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 54: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Overall Performance

• AIDA (Hoffart et al. 2011)

• Illinois Wikifier 2.0 (Cheng & Roth, 2013)

54

F1

50

62.5

75

87.5

100

AverageCand +Entity Type +Coref +Coherence AIDA Wikifier

79.6

72.279.078.076.775.0

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Page 55: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Error Analysis

55

Misc 10%

Specific Labels 14%

Context 33%

Coreference 10%

Types 14%

Metonymy 19%

Page 56: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Conclusion• a modular deterministic system achieves good performance

56

Vinculum

Page 57: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Conclusion• a modular deterministic system achieves good performance• a comprehensive evaluation over nine data sets

56

Vinculum

Page 58: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Conclusion• a modular deterministic system achieves good performance• a comprehensive evaluation over nine data sets

• CrossWikis provides better cond. prob.

56

Vinculum

Page 59: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Conclusion• a modular deterministic system achieves good performance• a comprehensive evaluation over nine data sets

• CrossWikis provides better cond. prob.• Fine-grained entity types are very useful

56

Vinculum

Page 60: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Conclusion• a modular deterministic system achieves good performance• a comprehensive evaluation over nine data sets

• CrossWikis provides better cond. prob.• Fine-grained entity types are very useful• Coreference and Coherence also improve the performance

56

Vinculum

Page 61: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Conclusion• a modular deterministic system achieves good performance• a comprehensive evaluation over nine data sets

• CrossWikis provides better cond. prob.• Fine-grained entity types are very useful• Coreference and Coherence also improve the performance

• http://github.com/xiaoling/vinculum

56

Vinculum

Page 62: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

• a modular deterministic system achieves good performance • a comprehensive evaluation over nine data sets

• CrossWikis provides better cond. prob. • Fine-grained entity types are very useful • Coreference and Coherence also improve the performance

• http://github.com/xiaoling/vinculum

Conclusion

57

Thanks!

Questions?

Vinculum

Page 63: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Entity Type

Candidate Generation

Coreference

Coherence

Mention Extraction

Oracle Entity Types

58

F1

50

62.5

75

87.5

100

ACE MSNBC AIDA-D AIDA-T TAC09 TAC10 TAC10T TAC11 TAC12 Overall+NER (Gold) +FIGER (Gold)

Page 64: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Implementation Details

59

Component Implementation

Mention Extraction Stanford NER

Candidate Generation CrossWikis

Entity Type Prediction Fine-grained Entity Types

Coreference Stanford Coreference

Coherence NGD + relational triples

Page 65: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

System Comparison

60

VINCULUM AIDA WIKIFIER

Mention Extraction NER NER NER, noun

phrasesCandidate Generation CrossWikis intra-Wikipedia intra-Wikipedia

Entity Types FIGER NER NER

Coreference representative mention - re-rank the

candidates

Coherence NGD, relational NGD NGD, relational

Learning deterministic trained on AIDA trained on Wiki

Page 66: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Error Analysis: Metonymy

• South Africa managed to avoid a fifth successive defeat in 1996 at the hands of the All Blacks …

• Prediction : South Africa

• Label : South Africa national rugby union team

61

Page 67: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Error Analysis: Entity Types

• Instead of Los Angeles International, for example, consider flying into Burbank or John Wayne Airport ...

• Prediction : Burbank, California

• Label : Bob Hope Airport

62

Page 68: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Error Analysis: Coreference

• It is about his mysterious father, Barack Hussein Obama, an imperious if alluring voice gone distant and then missing.

• Prediction : Barack Obama

• Label : Barack Obama Sr.

63

Page 69: Design Challenges for Entity Linking - GitHub Pagesxiaoling.github.io/pubs/ling-tacl15-slides.pdf · 2018-05-18 · Design Challenges for Entity Linking Xiao Ling, Sameer Singh, Daniel

Error Analysis: Context

• Scott Walker removed himself from the race, but Green never really stirred the passions of former Walker supporters, nor did he garner outsized support “outstate”.

• Prediction : Scott Walker (singer)

• Label : Scott Walker (politician)

64