68
Semantic Text Mining Mining Structured Information from Unstructured Data Besnik Fetahu

Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Semantic Text MiningMining Structured Information from

Unstructured DataBesnik Fetahu

Page 2: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Outline

• Introduction

• Information Sources

• Structured Data: Knowledge Bases and Formal Representation of Information

• Text Mining Applications

• Relation Extraction

• Named Entity Disambiguation

• Machine Reading

• Wikipedia News Suggestion

• Conclusions

Page 3: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Introduction

• Large Amounts of Data.

• Heterogeneity of information: provenance, quality, content, representation, language etc.

• Unstructured vs. Semi-Structured vs. Structured

• Knowledge Bases: maintenance, updating, addition of new facts

• Automated vs. Crowd—based analysis of data

Page 4: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Information Sources

• Unstructured:

• News Collections: NYTimes, Reuters, Wall Street Journal, GDelt etc.

• Web Resources: Common Crawl, ClueWeb

• Social Streams: Twitter, Facebook, Reddit

• Semi-Structured:

• Wikipedia

• Structured:

• Linked Data: Linked Open Data Cloud

• Knowledge Bases: DBpedia, YAGO, Freebase

Page 5: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Information Sources: GDelt

• ~138200 indexed daily news articles from

more than 3000 news domains • 37007 news domains in total • ~18682 daily entities with an average of 64

mentions per entity

news domain news articles

yahoo.com 1244781

allafrica.com 1035646

reuters.com 828133

dailymail.co.uk 815372

indiatimes.com 743991

wn.com 587607

Page 6: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Information Sources: GDelt

• ~138200 indexed daily news articles from

more than 3000 news domains • 37007 news domains in total • ~18682 daily entities with an average of 64

mentions per entity

news domain news articles

yahoo.com 1244781

allafrica.com 1035646

reuters.com 828133

dailymail.co.uk 815372

indiatimes.com 743991

wn.com 587607

Page 7: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Information Sources: Wikipedia

• 5 million articles • Articles structured into sections • Articles annotated with categories • Collaboratively edited and maintained

Page 8: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Information Sources: Wikipedia

Page 9: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Structured Data: Formal Representation of Knowledge and Knowledge Bases

• Semantic Web

• Ontologies: Knowledge Representation

• Knowledge Bases

Page 10: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Structured Data: Semantic Web

The ultimate goal of the Web of data is to enable computers to do more useful work and to develop systems that can support trusted interactions over the network. The term “Semantic Web” refers to W3C’s vision of the Web of linked data. Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data. Linked data are empowered by technologies such as RDF, SPARQL, OWL, and SKOS.

• Format: turtle, n3, etc. • Syntax: XML Schema• Models: RDF• Taxonomies: RDFS • Ontologies: OWL• Query languages: SPARQL• Interchange formats: RIF

Page 11: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Structured Data: Ontologies

Page 12: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Structured Data: Ontologies

Page 13: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Structured Data: Knowledge Bases

• Nell • TextRunner • YAGO • DBpedia • Freebase

Page 14: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Structured Data: Knowledge Bases

• Nell • TextRunner • YAGO • DBpedia • Freebase

Page 15: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Structured Data: Knowledge Bases

• Nell • TextRunner • YAGO • DBpedia • Freebase

Page 16: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Text Mining Applications

• Relation Extraction

• Named Entity Disambiguation

• Machine Reading

Page 17: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Text Mining Applications: Relation Extraction• DP of chunks of texts for relation extraction

• Syntactic patterns for relation extraction

• Semantic and Lexical patterns for relation extraction

Page 18: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Text Mining Applications: Named Entity Disambiguation

• Textual content has rich underlying syntactical and semantical structureSyntactic patterns for relation extraction.

• Frequently extracted syntactical and semantical information: POS, Co-Ref and NER.

• Named entity recognition with specific entity types Person, Organisation, Place, Date.

Page 19: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Text Mining Applications: Named Entity Disambiguation

• Textual content has rich underlying syntactical and semantical structureSyntactic patterns for relation extraction.

• Frequently extracted syntactical and semantical information: POS, Co-Ref and NER.

• Named entity recognition with specific entity types Person, Organisation, Place, Date.

Page 20: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Text Mining Applications: Named Entity Disambiguation

• NED: named entity disambiguation of surface forms with entities from knowledge bases • DBPedia Spotlight • Aida • Wikiminer

Page 21: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Everything done?

Page 22: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

• Only a small fraction of data is actually structured

• Cumbersome to define manually and explicitly schemas, taxonomies, ontologies

• Large proportion of data is unstructured or semi-structured

• Can we automatically extract and model such content?

Page 23: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

How can we enrich and maintain Wikipedia?

Page 24: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Why Wikipedia and News?

• Why Wikipedia?• Text Categorization • Entity Disambiguation • Entity Search • Knowledge Bases

• Why News?• Authoritative sources • Professionally edited and qualitative source of

information! • Inherent importance of reported events and facts about

entities in Wikipedia • Second most cited source of information in Wikipedia

Page 25: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

0

0.2

0.4

0.6

0.8

1

Com

icsC

reat

orAr

twor

kN

atur

alPl

ace

Airli

neFi

lmSo

ccer

Man

ager

Lega

lCas

eAl

bum

Band

Spor

tsTe

amTe

levi

sion

Show

Anat

omic

alSt

ruct

ure

Athl

ete

Wea

pon

Crim

inal

Mus

ical

Artis

tPo

litic

ian

Plan

tSo

ngN

on-P

rofit

Org

anis

atio

nBo

okAc

tor

Fict

iona

lCha

ract

erR

ecor

dLab

elBr

oadc

aste

rPo

litic

alPa

rtyAu

tom

obile

Trad

eUni

onSc

ient

ist

Milit

aryP

erso

nPh

iloso

pher

Tele

visi

onSe

ason

Elec

tion

Offi

ceH

olde

rSp

orts

Leag

ueG

over

nmen

tAge

ncy

Sing

leAn

imal

Awar

dSp

orts

Even

tAi

rpor

tM

ilitar

yCon

flict

Tele

visi

onEp

isod

eAi

rcra

ftM

agaz

ine

Writ

erLo

catio

n

news book court journal web thesis

0

500

1000

1500

2000

2500

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9

10

11

2001-EE2001-NEE

0

500

1000

1500

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9

10

11

2002-EE2002-NEE

0

500

1000

1500

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9

10

11

2003-EE2003-NEE

0

500

1000

1500

2000

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9

10

11

2004-EE2004-NEE

0

500

1000

1500

2000

2500

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9

10

11

2005-EE2005-NEE

0

500

1000

1500

2000

2500

3000

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9

10

11

2006-EE2006-NEE

News Distribution in Wikipedia

Entities reported in News and Wikipedia

Page 26: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

21

• Human fatalities: 10k vs 1.8k losses

• Estimated damages: $4.5 vs. $108 billions

• ‘Odisha cyclone’ has no coverage in the entity location ‘Odisha’

• ‘Hurricane Katrina’ finds broad coverage in entity location `New Orleans’

New OrleansOdisha

Hurricane KatrinaOdisha Cyclone

Why does this matter at all?

Page 27: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

• Entities comprise of facts and statements supported by external references!

• News as authoritative sources with emerging facts and events.

• Delay between the reporting of an event in news and its inclusion in entity pages1

• Incomplete section structure for long—tail entities

• Several implications on real-world applications that make use of Wikipedia, e.g. KB maintenance, entity disambiguation etc.

[1] “How much is Wikipedia lagging behind news?” Besnik Fetahu, Abhijit Anand and Avishek Anand, WebSci’15, Oxford, UK. 22

Page 28: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Approach: Automated news suggestion to entity pages

featureextrac*on

Some half a million people were evacuated from the southeastern Indian coast as Cyclone Phailin, a tropical storm from the Bay of Bengal, bore down on India. The states of Orissa and Andhra Pradesh, both of which have large coastal populations, were on high alert ahead of the storm’s expected arrival.

en**es

newsar*cle

23

Page 29: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Approach: Automated news suggestion to entity pages

featureextrac*on

Some half a million people were evacuated from the southeastern Indian coast as Cyclone Phailin, a tropical storm from the Bay of Bengal, bore down on India. The states of Orissa and Andhra Pradesh, both of which have large coastal populations, were on high alert ahead of the storm’s expected arrival.

en**es

newsar*cle

23

ar*cleen*typlacement

Odisha

Bay of Bengal Phailin

Task#1

Page 30: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Approach: Automated news suggestion to entity pages

featureextrac*on

Some half a million people were evacuated from the southeastern Indian coast as Cyclone Phailin, a tropical storm from the Bay of Bengal, bore down on India. The states of Orissa and Andhra Pradesh, both of which have large coastal populations, were on high alert ahead of the storm’s expected arrival.

en**es

newsar*cle

23

ar*cleen*typlacement

Odisha

Bay of Bengal Phailin

Task#1

oneclassifierperen*tytype

ar*clesec*onplacement

[state]:geography

[city]:climate…

Task#2

Page 31: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Approach: Automated news suggestion to entity pages

featureextrac*on

Some half a million people were evacuated from the southeastern Indian coast as Cyclone Phailin, a tropical storm from the Bay of Bengal, bore down on India. The states of Orissa and Andhra Pradesh, both of which have large coastal populations, were on high alert ahead of the storm’s expected arrival.

en**es

newsar*cle

sec*ons

wikipediaen*typage

23

ar*cleen*typlacement

Odisha

Bay of Bengal Phailin

Task#1

oneclassifierperen*tytype

ar*clesec*onplacement

[state]:geography

[city]:climate…

Task#2

Page 32: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Article—Entity Placement

Page 33: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

25

Nikola Tesla

Elon MuskLarry Page

John B. Kennedy

News Suggestion Attributes: Task#1 Entity Salience

Entity Salience: Relative Entity Frequency

• reward entity appearing throughout the text • reward entity appearing in the top paragraphs • weigh an entity w.r.t its co-occurring entities

Tesla is a central concept in the given news article

Page 34: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

26

News Suggestion Attributes: Task#1 Relative Entity Authority

Elias TabanHillary Clinton

Relative Entity Authority

• entities with `low authority’ have lower entry barrier for a news article

• a news article in which an entity co-occurs with `high authority’ entities conveys news the importance

• entity authority as an a priori probability or any centrality based measure

Page 35: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

News Suggestion Attributes: Task#1 Novelty & Redundancy

previously added news articles

• novelty is measured w.r.t previously added news articles in an entity page

• major events have wide coverage in news media • place the news article into the correct section

Novelty and Redundancy Measure

27

Page 36: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

News Suggestion Attributes: Task#1 Novelty & Redundancy

previously added news articles

• novelty is measured w.r.t previously added news articles in an entity page

• major events have wide coverage in news media • place the news article into the correct section

Novelty and Redundancy Measure

27

Page 37: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Article—Section Placement

Page 38: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Task#2: Section—template generation

Germanwings Adria Lufthansa

• Section templates per entity type • Pre-determined number of main

sections • Canonicalize sections • Generate `complete’ section

templates based on similar entities • Cluster based on the X—means[3]

algorithm

[3] D. Pelleg, A. W. Moore, et al. X-means: Extending k-means with efficient estimation of the number of clusters. In ICML, pages 727–734, 2000. 29

Page 39: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Task#2: Overall news—section fit

• What is the best section to append a given news article?• measure overall similarity between n and the pre-computed sections in

the section templates

• Similarity aspects between news articles and sections • Topic similarity (LDA models over the sections and news documents) • Syntactic similarity • Lexical similarity • Entity—based similarity (overlap of named entities) • Frequency

30

Page 40: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Can we do more than suggest news to a

Wikipedia Section?!

Page 41: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Can we do more than suggest news to a

Wikipedia Section?!

Suggest citations to actual statements in Wikipedia

Page 42: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

External Links in Wikipedia and Knowledge Bases

Page 43: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Finding News Citations in Wikipedia!

On October 9, 2009, the Norwegian Nobel Committee announced that Obama had won the 2009 Nobel Peace Prize "for his extraordinary efforts to strengthen international diplomacy and cooperation between peoples". Obama accepted this award in Oslo, Norway on December 10, 2009, with "deep gratitude and great humility." The award drew a mixture of praise and criticism from world leaders and media figures.

Page 44: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

34

Page 45: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

34

“Citogenesis”: Citogenesis, on the other hand is a portmanteau of 'Citation' and 'Genesis'. A Citation is a reference to a source, used to back up a specific claim. Genesis means the origin of something. By extension, citogenesis is the creation of text in a reliable source that can be cited to back-up a claim.

Page 46: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

35

Page 47: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

35

Citogenesis[citation needed]

Page 48: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

35

Citogenesis[citation needed]

what type of citation do we need here?

Page 49: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

35

Citogenesis[citation needed]

what type of citation do we need here?

Page 50: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

35

Citogenesis[citation needed]

what type of citation do we need here?

which citation do we place for this definition?

Page 51: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

35

Citogenesis[citation needed]

what type of citation do we need here?

which citation do we place for this definition?

Page 52: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

35

Citogenesis[citation needed]https://www.explainxkcd.com/wiki/index.php/978:_Citogenesis

what type of citation do we need here?

which citation do we place for this definition?

Page 53: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

35

Citogenesis[citation needed]https://www.explainxkcd.com/wiki/index.php/978:_Citogenesis

what type of citation do we need here?

which citation do we place for this definition?

Page 54: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

35

Citogenesis[citation needed]https://www.explainxkcd.com/wiki/index.php/978:_Citogenesis

https://xkcd.com/978/

what type of citation do we need here?

which citation do we place for this definition?

Page 55: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Finding Citations: Motivation

36

• Increase Wikipedia article quality by providing citations to external references

• Replace and update existing citations with higher quality and authority references

• Find and replace citations for statements that have dead URLs

• Find citations for statements that are flagged with a “citation needed” tag, currently around 300k statements

• Automate the process of enriching Wikipedia and help editors in the decision process of providing citations to existing or new Wikipedia statements

Page 56: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Motivation: Acknowledged Problem by Wikimedia Foundation

37

Page 57: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Approach Overview

38

Wikipedia entities

Barack Obama

• Sections • Anchors • Text • Categories

typeOfPolitician

Obama was born on August 4, 1961,[4] …..

The couple married in Wailuku on Maui on …

After graduating with a JD degree magna cum laude[49]…

Obama was elected to the Illinois Senate in …

Entity statementsFeature

Extraction

news statement?

Task#1: Classify statements Task#2: Find citations

QueryConstruction “Obama”, “Illinois”, “Senate”

newsindex

doc_1doc_2………

doc_k

Choose Document

Feature Extraction

Classify Correct Reference

YES

1.Statement Classification

2.Finding Citations

Page 58: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Statement Classification

Page 59: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Statement Classification

40

Features:1. Wikipedia Entity Structure 2. Language Style 3. Entity Type Probabilities

Train Supervised Models:1. Learn models that predict

accurately the citation category of a statement.

2. Multi-class classification problem for citation categories: {web, news, comic, journal …}

3. Optimize for “news” statements

Page 60: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Finding Citations

Page 61: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Finding Citations

42

Wikipedia Statement: On October 9, 2009, the Norwegian Nobel Committee announced that Obama had won the 2009 Nobel Peace Prize "for his extraordinary efforts to strengthen international diplomacy and cooperation between peoples". Obama accepted this award in Oslo, Norway on December 10, 2009, with "deep gratitude and great humility." The award drew a mixture of praise and criticism from world leaders and media figures.

News Index

Q={Wikipedia Statement}

doc_1doc_2………

doc_k

retrieve top—100

http://nobelprize.org/nobel_prizes/peace/laureates/2009/http://www.cnn.com/2009/politics/12/10/obama.transcript/index.htmlhttp://www.timesonline.co.uk/tol/news/world/us_and_americas/article6868905.ece………….. ………….. ………….. ………….. ………….. …………..http://www.msnbc.msn.com/id/33237202/http://www.reuters.com/article/topnews/idustre5981jk20091009?sp=truehttp://www.whitehouse.gov/the_press_office/remarks-by-the-president-on-winning-the-nobel-peace-prize/

pick relevant articles

Page 62: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

43

Tree Kernel: K(s1, s2)

• Capture syntactic similarity between two sentences

• Capture semantic similarity between two sentences by checking the POS of a word

Finding Citations: Entailment

Page 63: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

44

• Not all sentences in a news article have the same importance

• Capture the entailment features w.r.t the central sentence

Central sentence in a news article (TextRank):

Finding Citations: Centrality Features

Page 64: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

45

• For different entity types domains have varying authority.

• We learn to predict the more reliable and authoritative sources of information

Finding Citations: News Domains

Page 65: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Finding Citations: Why other and crowdsourced evaluation strategies?

46

Page 66: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Finding Citations: Why other and crowdsourced evaluation strategies?

46

Page 67: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Conclusions

• Enrich and Expand Wikipedia Entity Pages

• Maintain and up to date and consistent state of Wikipedia

• Improve quality of Wikipedia pages

• Knowledge Bases approaches benefit from richer content in Wikipedia and more up to date

• Applications, like Google Search, Q&A systems like Siri etc., benefit due to their use of Wikipedia

Page 68: Semantic Text Mining - KBS · •Entities comprise of facts and statements supported by external references! • News as authoritative sources with emerging facts and events. •

Thank You! Questions?

For more: Twitter: @FetahuBesnik Web: http://l3s.de/~fetahu/