48
HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi Nastase Heidelberg Institute for Theoretical Studies gGmbH Heidelberg, Germany 1 / 48

HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

HITS’ Cross-lingual Entity Linking

System at TAC 2011

One Model for All Languages

Angela Fahrni, Michael Strube, Vivi Nastase

Heidelberg Institute for Theoretical Studies gGmbHHeidelberg, Germany

1 / 48

Page 2: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

2 / 48

Page 3: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

3 / 48

Page 4: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

4 / 48

Page 5: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

5 / 48

Page 6: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

6 / 48

Page 7: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

7 / 48

Page 8: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

8 / 48

Page 9: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Strategies for Cross-lingualEntity Disambiguation

9 / 48

Page 10: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Strategies for Cross-lingualEntity Disambiguation

10 / 48

Page 11: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Strategies for Cross-lingualEntity Disambiguation

11 / 48

Page 12: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Strategies for Cross-lingualEntity Disambiguation

12 / 48

Page 13: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Strategies for Cross-lingualEntity Disambiguation

13 / 48

Page 14: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Knowledge base

14 / 48

Page 15: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Knowledge base

15 / 48

Page 16: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Mapping between the Englishand Chinese Wikipedia Versions

16 / 48

Page 17: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Mapping between the Englishand Chinese Wikipedia Versions

17 / 48

Page 18: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Mapping between the Englishand Chinese Wikipedia Versions

18 / 48

Page 19: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Mapping between the Englishand Chinese Wikipedia Versions

19 / 48

Page 20: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Coverage after Lexicon Lookup

Training SetCoverage Average Ambiguity

EN 0.928 16.92ZH 0.867 5.56ZH_Lex 0.930 22.86

20 / 48

Page 21: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

21 / 48

Page 22: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Different Techniques for DifferentLanguages

22 / 48

Page 23: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Different Models for DifferentLanguages

23 / 48

Page 24: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

One Model for All Languages

24 / 48

Page 25: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Context DisambiguationApproach

25 / 48

Page 26: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Context DisambiguationApproach

26 / 48

Page 27: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Context DisambiguationApproach

27 / 48

Page 28: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Context DisambiguationApproach

28 / 48

Page 29: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Context DisambiguationApproach

29 / 48

Page 30: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Context DisambiguationApproach

30 / 48

Page 31: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Context DisambiguationApproach

31 / 48

Page 32: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Context DisambiguationApproach

• Training of a supervised model (SVM) using English traininginstances

• Training instances are derived from the internal hyperlinks inWikipedia

• Positive instances: linked Wikipedia articles• Negative instances: other Wikipedia articles the terms can refer to

• Model is used for English and Chinese

32 / 48

Page 33: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Entity Disambiguation

33 / 48

Page 34: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Entity Disambiguation Approach

• Training instances are derived from the training data provided byTAC

• Positive instances: correct entity for the query terms• Negative instances: other candidate entities for the query terms

• Training of a supervised model (SVM) based on• English and Chinese training instances• Chinese training instances• English instances

• Application of the models to both languages

34 / 48

Page 35: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

One-Model-for-All-Languages:Features

• Abstraction from particular languages, i.e. no lexical features

• Relatedness measures

• Concept-based features

35 / 48

Page 36: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

One-Model-for-All-Languages:Features

• Prior probability

• Distance between term and name of the respective Wikipedia article

• Context fit on conceptual level• Vector-based features based on concepts, categories, lists, portals• Average, maximum and minimum derived from pairwise calculated

relatedness measures• Relatedness measures based on outgoing links, incoming links,

categories

• Context fit on token level: Cosine similarity based on nouns, verbs,adjectives between:

• Whole text and respective Wikipedia article• Surrounding context and local context of hyperlinks pointing to the

respective page in Wikipedia

36 / 48

Page 37: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Results Micro-AverageDifferent Training Approaches

Micro-AverageHITS_Trained_EN_ZH 0.785

HITS_Trained_SEP 0.789HITS_Trained_ZH 0.786

37 / 48

Page 38: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

38 / 48

Page 39: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Language-independentConcept-based Representation

39 / 48

Page 40: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Language-independentConcept-based Representation

40 / 48

Page 41: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Graph

41 / 48

Page 42: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Clustering

42 / 48

Page 43: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Learning the Edge Weights

• Binary classifier (SMV) with the classes:• In same cluster• Not in the same cluster

• Confidence values as edge weights

• Features: Cosine similarities between local contexts and whole textsbased on:

• Identified concepts• Identified concepts extended by incoming and outgoing links• Categories associated with the concepts• Lists associated with the concepts• Portals associated with the concepts

43 / 48

Page 44: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Shortcomings of Graph-basedClustering Approach

• Construction of the graph is inefficient• Pairwise comparisons using different similarity measures

• Few data to optimize the parameters

• Temporary solution: String match heuristic

44 / 48

Page 45: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Results

Micro-Average Precision (B3) Recall (B3) F1 (B3)Best System 0.788Median 0.675HITS 0.785 0.700 0.763 0.730

45 / 48

Page 46: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

46 / 48

Page 47: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Future Work

• Experiments with other languages• We participated in the cross-lingual link discovery task organized by

NTCIR: good results for all subtasks (EN-to-Chinese, EN-to-Korean,EN-to-Japanese)

• Efficient way to cluster terms: Circumventing pairwise comparisons

47 / 48

Page 48: HITS’ Cross-lingual Entity Linking System at TAC 2011 · HITS’ Cross-lingual Entity Linking System at TAC 2011 One Model for All Languages Angela Fahrni, Michael Strube, Vivi

Thank you!

Angela [email protected]://www.h-its.org

48 / 48