14
GENERATING NEWSCASTS SEMANTIC SNAPSHOTS USING ENTITY EXPANSION JOSÉ LUIS REDONDO GARCIA GIUSEPPE RIZZO LILIA PÉREZ ROMERO MICHIEL HILDEBRAND RAPHAËL TRONCY @peputo / [email protected] @giusepperizzo / [email protected] L.Perez@ cwi.nl @McHildebrand / Michiel.Hildebrand@ cwi.nl @rtroncy / raphael.troncy@ eurecom.fr

News Semantic Snapshot

Embed Size (px)

Citation preview

Page 1: News Semantic Snapshot

GENERATING NEWSCASTS SEMANTIC SNAPSHOTS USING ENTITY EXPANSION

JOSÉ LUIS REDONDO GARCIA

GIUSEPPE RIZZO

LILIA PÉREZ ROMERO

MICHIEL HILDEBRAND

RAPHAËL TRONCY

@peputo / [email protected]

@giusepperizzo / [email protected]

[email protected]

@McHildebrand / [email protected]

@rtroncy / [email protected]

Page 2: News Semantic Snapshot

15th International Conference on Web Engineering (ICWE) 2

NEWS CONSUMPTION SEMANTIC SNAPSHOT (NSS)

Named Entity Expansion

News item

News Semantic Snapshot (NSS)

Snowden asks Russia for asylum

April 15, 2023

Page 3: News Semantic Snapshot

April 15, 2023 15th International Conference on Web Engineering (ICWE)

NEWS ENTITY EXPANSION

NSS

3

(20) (1) (4) (4)Web-based, Unsupervised, Sequential

Page 4: News Semantic Snapshot

April 15, 2023 15th International Conference on Web Engineering (ICWE) 4

Involving: (experts in the news domain + users)

Dimensions:

Play with the data and help us to extend it at:

https://github.com/jluisred/NewsConceptExpansion/wiki/Golden-Standard-Creation

EVALUATION: NEWS ENTITIES GOLD STANDARD

(1) Video Subtitles(2) Image in the video(3) Text in the video image(4) Suggestions of an expert(5) Related articles

Page 5: News Semantic Snapshot

April 15, 2023 15th International Conference on Web Engineering (ICWE) 5

DOCUMENT COLLECTION(20 variations)

Using Google Custom Search Engine (CSE)1

[1] https://cse.google.com/cse/all

N

…N NN N N

N N N N N N N N N N

N N N

Web sites to be crawled:- Google:

- L1 : A set of 10 internationals English speaking newspapers

- L2 : A set of 3 international newspapers used in GS

Temporal Window:- 1W:

- 2W:

Annotation filtering:

Page 6: News Semantic Snapshot

April 15, 2023 15th International Conference on Web Engineering (ICWE) 6

DOCUMENT ANNOTATION

NER extractors in NERD *

(*) Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web, Rizzo et al. (2004)

Page 7: News Semantic Snapshot

April 15, 2023 15th International Conference on Web Engineering (ICWE) 7

ENTITY FILTERING(4 variations)

Filtering dimensions:

- F1: NERD type:- Person

- Organization

- Location

- F2: Confidence score:> Threshold

- F3: Capitalization:

countrypresidentObamaasylum

Page 8: News Semantic Snapshot

April 15, 2023 15th International Conference on Web Engineering (ICWE) 8

RANKING STRATEGIES (1)

increase representativeness leverage on entity frequency

(Freq) (Gaussian)

Page 9: News Semantic Snapshot

April 15, 2023

RANKING STRATEGIES (2)

Rules: [ Sel(e) , ]

POPULARITY EXPERT RULES

9

- Based on Google Trends- w = 2 months- μ + 2*σ (2.5%)- .

Example:- [ Location, = 0.48 ]- [ Person, = 0.74 ]- [ Organization, = 0.95 ]- [ < 2 , = 0.0 ]

(4 variations)

15th International Conference on Web Engineering (ICWE) 9

Page 10: News Semantic Snapshot

April 15, 2023 15th International Conference on Web Engineering (ICWE) 10

EVALUATION: MEASURES

Mean P/R at N:- Most popular- Easy to interpret

Mean Average Precision at N (MAP):- Considers ranking - Relevant documents at the top positions

Mean Normalized Discounted Cumulative Gain at N (MNDCG):- Different levels of document relevance- The lower an high relevant document is ranked, the less useful

is for the userN = 10

Page 11: News Semantic Snapshot

April 15, 2023 15th International Conference on Web Engineering (ICWE) 11

RESULTS (1)

Baselines:

BS1: Former Entity Expansion Implementation*

• Google• No temporal window• No_Schema.org • No_Filter•

BS2: TFIDF-based Function.

(*) Describing and Contextualizing Events in TV News Show, Redondo et

al. (2014)

Page 12: News Semantic Snapshot

RESU

LTS (

2)

12

20 x 4 x 4 =

320 runs

F3 Freq + POP + EXPGoogle + 2W + Schema.org 12

Page 13: News Semantic Snapshot

April 15, 2023 15th International Conference on Web Engineering (ICWE) 13

CONCLUSIONS & FUTURE WORK- News Entity Expansion Generate the News

Semantic Snapshot- Best score: 0.666 in MNDCG at 10, better than BS1/2

• Collection: CSE (Google + 2W + Schema.org)• Filtering: F3• Ranking: Freq + POP + EXP

What’s next:- Extend the Ground Truth- Supervised approach- Better exploit semantic connections between entities in KB- Is MNDCG@10 an ideal indicator for assessing NSS quality?