Upload
joseluisredondo
View
119
Download
3
Tags:
Embed Size (px)
Citation preview
GENERATING NEWSCASTS SEMANTIC SNAPSHOTS USING ENTITY EXPANSION
JOSÉ LUIS REDONDO GARCIA
GIUSEPPE RIZZO
LILIA PÉREZ ROMERO
MICHIEL HILDEBRAND
RAPHAËL TRONCY
@peputo / [email protected]
@giusepperizzo / [email protected]
@McHildebrand / [email protected]
@rtroncy / [email protected]
15th International Conference on Web Engineering (ICWE) 2
NEWS CONSUMPTION SEMANTIC SNAPSHOT (NSS)
Named Entity Expansion
News item
News Semantic Snapshot (NSS)
Snowden asks Russia for asylum
April 15, 2023
April 15, 2023 15th International Conference on Web Engineering (ICWE)
NEWS ENTITY EXPANSION
NSS
3
(20) (1) (4) (4)Web-based, Unsupervised, Sequential
April 15, 2023 15th International Conference on Web Engineering (ICWE) 4
Involving: (experts in the news domain + users)
Dimensions:
Play with the data and help us to extend it at:
https://github.com/jluisred/NewsConceptExpansion/wiki/Golden-Standard-Creation
EVALUATION: NEWS ENTITIES GOLD STANDARD
(1) Video Subtitles(2) Image in the video(3) Text in the video image(4) Suggestions of an expert(5) Related articles
April 15, 2023 15th International Conference on Web Engineering (ICWE) 5
DOCUMENT COLLECTION(20 variations)
Using Google Custom Search Engine (CSE)1
[1] https://cse.google.com/cse/all
N
…N NN N N
N N N N N N N N N N
N N N
Web sites to be crawled:- Google:
- L1 : A set of 10 internationals English speaking newspapers
- L2 : A set of 3 international newspapers used in GS
Temporal Window:- 1W:
- 2W:
Annotation filtering:
April 15, 2023 15th International Conference on Web Engineering (ICWE) 6
DOCUMENT ANNOTATION
NER extractors in NERD *
(*) Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web, Rizzo et al. (2004)
April 15, 2023 15th International Conference on Web Engineering (ICWE) 7
ENTITY FILTERING(4 variations)
Filtering dimensions:
- F1: NERD type:- Person
- Organization
- Location
- F2: Confidence score:> Threshold
- F3: Capitalization:
countrypresidentObamaasylum
April 15, 2023 15th International Conference on Web Engineering (ICWE) 8
RANKING STRATEGIES (1)
increase representativeness leverage on entity frequency
(Freq) (Gaussian)
April 15, 2023
RANKING STRATEGIES (2)
Rules: [ Sel(e) , ]
POPULARITY EXPERT RULES
9
- Based on Google Trends- w = 2 months- μ + 2*σ (2.5%)- .
Example:- [ Location, = 0.48 ]- [ Person, = 0.74 ]- [ Organization, = 0.95 ]- [ < 2 , = 0.0 ]
(4 variations)
15th International Conference on Web Engineering (ICWE) 9
April 15, 2023 15th International Conference on Web Engineering (ICWE) 10
EVALUATION: MEASURES
Mean P/R at N:- Most popular- Easy to interpret
Mean Average Precision at N (MAP):- Considers ranking - Relevant documents at the top positions
Mean Normalized Discounted Cumulative Gain at N (MNDCG):- Different levels of document relevance- The lower an high relevant document is ranked, the less useful
is for the userN = 10
April 15, 2023 15th International Conference on Web Engineering (ICWE) 11
RESULTS (1)
Baselines:
BS1: Former Entity Expansion Implementation*
• Google• No temporal window• No_Schema.org • No_Filter•
BS2: TFIDF-based Function.
(*) Describing and Contextualizing Events in TV News Show, Redondo et
al. (2014)
RESU
LTS (
2)
12
20 x 4 x 4 =
320 runs
F3 Freq + POP + EXPGoogle + 2W + Schema.org 12
April 15, 2023 15th International Conference on Web Engineering (ICWE) 13
CONCLUSIONS & FUTURE WORK- News Entity Expansion Generate the News
Semantic Snapshot- Best score: 0.666 in MNDCG at 10, better than BS1/2
• Collection: CSE (Google + 2W + Schema.org)• Filtering: F3• Ranking: Freq + POP + EXP
What’s next:- Extend the Ground Truth- Supervised approach- Better exploit semantic connections between entities in KB- Is MNDCG@10 an ideal indicator for assessing NSS quality?
JOSÉ LUIS REDONDO GARCIA
GIUSEPPE RIZZO
LILIA PÉREZ ROMERO
MICHIEL HILDEBRAND
RAPHAËL TRONCY
@peputo / [email protected]
@giusepperizzo / [email protected]
@McHildebrand / [email protected]
@rtroncy / [email protected]