14
GENERATING NEWSCASTS SEMANTIC SNAPSHOTS USING ENTITY EXPANSION JOSÉ LUIS REDONDO GARCIA GIUSEPPE RIZZO LILIA PÉREZ ROMERO MICHIEL HILDEBRAND RAPHAËL TRONCY @peputo / [email protected] @giusepperizzo / [email protected] L.Perez@ cwi.nl @McHildebrand / Michiel.Hildebrand@ cwi.nl @rtroncy / raphael.troncy@ eurecom.fr

News Semantic Snapshot

Embed Size (px)

Citation preview

GENERATING NEWSCASTS SEMANTIC SNAPSHOTS USING ENTITY EXPANSION

JOSÉ LUIS REDONDO GARCIA

GIUSEPPE RIZZO

LILIA PÉREZ ROMERO

MICHIEL HILDEBRAND

RAPHAËL TRONCY

@peputo / [email protected]

@giusepperizzo / [email protected]

[email protected]

@McHildebrand / [email protected]

@rtroncy / [email protected]

15th International Conference on Web Engineering (ICWE) 2

NEWS CONSUMPTION SEMANTIC SNAPSHOT (NSS)

Named Entity Expansion

News item

News Semantic Snapshot (NSS)

Snowden asks Russia for asylum

April 15, 2023

April 15, 2023 15th International Conference on Web Engineering (ICWE)

NEWS ENTITY EXPANSION

NSS

3

(20) (1) (4) (4)Web-based, Unsupervised, Sequential

April 15, 2023 15th International Conference on Web Engineering (ICWE) 4

Involving: (experts in the news domain + users)

Dimensions:

Play with the data and help us to extend it at:

https://github.com/jluisred/NewsConceptExpansion/wiki/Golden-Standard-Creation

EVALUATION: NEWS ENTITIES GOLD STANDARD

(1) Video Subtitles(2) Image in the video(3) Text in the video image(4) Suggestions of an expert(5) Related articles

April 15, 2023 15th International Conference on Web Engineering (ICWE) 5

DOCUMENT COLLECTION(20 variations)

Using Google Custom Search Engine (CSE)1

[1] https://cse.google.com/cse/all

N

…N NN N N

N N N N N N N N N N

N N N

Web sites to be crawled:- Google:

- L1 : A set of 10 internationals English speaking newspapers

- L2 : A set of 3 international newspapers used in GS

Temporal Window:- 1W:

- 2W:

Annotation filtering:

April 15, 2023 15th International Conference on Web Engineering (ICWE) 6

DOCUMENT ANNOTATION

NER extractors in NERD *

(*) Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web, Rizzo et al. (2004)

April 15, 2023 15th International Conference on Web Engineering (ICWE) 7

ENTITY FILTERING(4 variations)

Filtering dimensions:

- F1: NERD type:- Person

- Organization

- Location

- F2: Confidence score:> Threshold

- F3: Capitalization:

countrypresidentObamaasylum

April 15, 2023 15th International Conference on Web Engineering (ICWE) 8

RANKING STRATEGIES (1)

increase representativeness leverage on entity frequency

(Freq) (Gaussian)

April 15, 2023

RANKING STRATEGIES (2)

Rules: [ Sel(e) , ]

POPULARITY EXPERT RULES

9

- Based on Google Trends- w = 2 months- μ + 2*σ (2.5%)- .

Example:- [ Location, = 0.48 ]- [ Person, = 0.74 ]- [ Organization, = 0.95 ]- [ < 2 , = 0.0 ]

(4 variations)

15th International Conference on Web Engineering (ICWE) 9

April 15, 2023 15th International Conference on Web Engineering (ICWE) 10

EVALUATION: MEASURES

Mean P/R at N:- Most popular- Easy to interpret

Mean Average Precision at N (MAP):- Considers ranking - Relevant documents at the top positions

Mean Normalized Discounted Cumulative Gain at N (MNDCG):- Different levels of document relevance- The lower an high relevant document is ranked, the less useful

is for the userN = 10

April 15, 2023 15th International Conference on Web Engineering (ICWE) 11

RESULTS (1)

Baselines:

BS1: Former Entity Expansion Implementation*

• Google• No temporal window• No_Schema.org • No_Filter•

BS2: TFIDF-based Function.

(*) Describing and Contextualizing Events in TV News Show, Redondo et

al. (2014)

RESU

LTS (

2)

12

20 x 4 x 4 =

320 runs

F3 Freq + POP + EXPGoogle + 2W + Schema.org 12

April 15, 2023 15th International Conference on Web Engineering (ICWE) 13

CONCLUSIONS & FUTURE WORK- News Entity Expansion Generate the News

Semantic Snapshot- Best score: 0.666 in MNDCG at 10, better than BS1/2

• Collection: CSE (Google + 2W + Schema.org)• Filtering: F3• Ranking: Freq + POP + EXP

What’s next:- Extend the Ground Truth- Supervised approach- Better exploit semantic connections between entities in KB- Is MNDCG@10 an ideal indicator for assessing NSS quality?