Enriching Media Collections for Event-based Exploration

Preview:

Citation preview

ENRICHING MEDIA COLLECTIONS FOR EVENT-BASED EXPLORATIONVictor de Boer, Liliana Melgar, Oana Inel, Carlos Martinez Ortiz, Lora Aroyo, and Johan Oomen

MTSR 2017

2

Cultural Heritage Collections becoming available as Linked Open Data

Support exploratory, event-centric browsing

of multiple, heterogeneous collections for Media Scholars

DIVE+ Case study

OPENIMAGES.EU

3,220 news broadcasts

Netherlands Institute for Sound & Vision

GTAA thesaurus

DELPHER.NL

197,199 Scans of Radio bulletins

1937 – 1984

AMSTERDAM MUSEUM

73,447 cultural heritage objects

AM Thesaurus

TROPENMUSEUM

78,270 cultural heritage objects

SVNC thesaurus

DIVE+ Collections and Vocabularies

Interactive Exploration & Discovery in Contextlinking objects to events and entitiesbuilding automatic storylines (proto-narratives)

Goal: develop explorable Knowledge Graph

Our recipe

Mapping to popular vocabularies

am:obj_22093 “Job Cohen”am:contentPersonName

rdfs:subPropertyOf

dcterms:subject

1. Mapping to generic schema

DIVE+

Van Hage, W. R., Malaisé, V., Segers, R., Hollink, L., & Schreiber, G. (2011). Design

and use of the Simple Event Model (SEM). Web Semantics: Science, Services and

Agents on the World Wide Web, 9(2), 128-136.

Simple Event Model (SEM)

sem:Event

sem:Actordive:MediaObjectdive:depictedBy

rdfs:label

dive:source

dive:placeholder

dc:identifier

dc:description

etc.

oa:Annotation

oa:hasBodyoa:hasTarget

sem:Place

sem:Time

skos:Concept

sem:hasActor,

sem:hasPlace

sem:hasTime

dive:isRelatedTo

skos:broader,

skos:narrower etc.

dive:isRelatedTo

DIVE+ Generic data model

DIVE+ manually created RDFS mapping files

# mapping triples

OI 3

NB - (conversion in project)

AM 12

TM 18

ENTITY EXTRACTION

EVENTS

LINKING EVENTS AND CONCEPTS

2. Enrichment: Hybrid strategy

Original Metadata

Interpretation of content

Named Entity Recognition

Human computation

Hybrid pipeline

Where do we get events from?

- LIDO, CIDOC, EDM

- creationDateStart- - Interpretation of object

- NLP tools, other pipelines

- - Crowdsourcing- -Nichesourcing,

Original Metadataam:Belgische opstand

am:besnijdenis

am:Beurs de Keyser

am:bevrijding

am:bezoekerscentrum

am:bibliotheken

am:Bijlmerramp

am:Boulevard of Broken Dreams

am:brand

am:brand van het oude stadhuis op de Dam

am:burgeroorlog

am:capitulatie

am:christendom geboorte van Christus

am:christendom kruisiging

am:christendom opstanding van Christus

am:christus aan het kruis

am:Christus schrijft op de grond

am:concert

"Fayence bord”

Crowdsourcing for Events in Texts & Videos

CrowdTruth.org

Description EventFoto is genomen tijdens de Eerste Zuid Nieuw-Guinea Expeditie

Eerste Zuid Nieuw-Guinea Expeditie

"Foto is genomen tijdens de Eerste- of de Tweede Zuid Nieuw-Guinea Expeditie"

Tweede Zuid Nieuw-Guinea Expeditie

"Masker gedragen tijdens oogstfeesten. Het feest in kwestie is het Sokari spel dat eenmaal per jaar wordt opgevoerd gedurende zeven opeenvolgende nachten na Nieuwjaar, medio april. …” Nieuwjaar

FROG NLP toolkit NER Event extraction

Victor Kramer

https://languagemachines.github.io/frog/

Radio news bulletins: Every object 1 event

Establish explorable links through shared vocabularies

DIVE:MEDIA OBJECT SEM:EVENT

SEM:PLACE

SEM:TIME

SEM:ACTOR

SKOS:CONCEPT

OA:ANNOTATION

PLACE

ACTOR

SKOS:EXACTMATCH

http://cultuurlink.beeldengeluid.nl/

Interactive vocabulary alignment

DIVE:MediaObjectNieuws uit Indonesië:opheffing van het KNIL

dive:depictedBy

sem:hasTimestamp

sem:EventANP:1950-08-11:50

dive:isRelatedTodive:relatedPlacesem:hasPlace

dive:isRelatedTodive:relatedActorsem:hasActor

dive:isRelatedTodive:relatedPlacesem:hasPlace

sem:Time25 Juli 1950

dive:depictedBy

sem:hasTimestamp

DIVE:MediaObjectMannen bij het huis van Paul Spies

aan de Parapattan 42, Djakarta

dive:depictedBy

dive:depictedBy

dive:depictedBy

DIVE:MediaObjectANP:1950-08-11:50

DIVE:MediaObjectSchaal

sem:Time11 Augustus 1950

sem:Eventontbindingsceremonie

sem:PlaceDjakarta

sem:PlaceIndonesië

Result: Explorable Knowledge graph

sem:Actor“Mohammed Hatta”

DIVE+ Enrichments

Enrichment

method

Media

Objects Actors Places Events Other Alignments

OI Crowd + NER 3,204 1,249 1,412 1,916 185,846 623

NB

Interpreted +

NER 197,200 194,890 54,571 197,200 6,736 6,353

AM

original

thesaurus 73,447 66,966 5,973 148 28,047 6,865

TM

original

thesaurus +

FROG NER 78,226 27,829 3,896 23* 13,269 -

Total 352,077 290,934 65,852 199,264 233,898 -

*) more to come

Subject-Object Property supertype Count

Media Object-Event dive:depictedBy or dive:isRelatedTo 199,233

Event-Actor sem:hasActor 265,677

Event-Place sem:hasPlace 220,726

Event-Concept dive:isRelatedTo 230

DIVE+ path fragments

Cliopatria triple store - 15M triples (for now) - Sparql endpoint

Provenance management at Named Graph level

http://data.dive.beeldengeluid.nl

DIVE+ UI

https://github.com/CLARIAH/grlc

API Layer

DIVE+ UI: INFINITY OF EXPLORATION

/ Support exploration and serendipity // Visual inspection of media objects and entities /

/ Lets user build, save and share Proto-Narratives/

https://youtu.be/FI3MPiU9rjo?t=138

http://diveplus.beeldengeluid.nl

filters

results ordering

filter on media objects

order media

objects by date

filter on events

explore

event

related

entities

explore

event

event

related

entities

place entity

exploration

narrative

bookmarking

/ Generic data model for connecting heterogeneous media collections

/ Various data enrichment strategies to construct explorable event-centric knowledge graphs

/ DIVE+ Case Study

Take home

/ http://diveproject.beeldengeluid.nl / http://diveplus.beeldengeluid.nl

/ v.de.boer@vu.nl

DIVE+

DIVE+ team

Current work: (Common) Event thesaurus?

Februaristaking

WOII

Februaristaking

“De oproep 'Staakt!'

voor deelname aan

de februaristaking te

Amsterdam op 25 en

26 februari 1941. “

stakingen

Eduard Hellendoorn

"Joseph Eijl Eduard Hellendoorn

Hermanus Coenradi 13 maart 1941

gefusilleerd Waalsdorpervlakte"

Waalsdorpervlakte

Jessie Both & Didi de hooge

3. Alignments to vocabularies

sem:Event

oi:Opening_afsluitdijk

dive:isRelatedTo

sem:hasActor

sem:Actor

dive:Person

oi:Ingenieur_Lely

dive:isRelatedTo

dive:relatedPlace

sem:hasPlace

dive:MediaObject

dive:Video

oi:9999dive:depictedBy

http://iopenimages.nl/vi

deo1.mpg

dive:MediaObje

ct dive:Image

kb:image2

oa:Annotation

dive:9999ann

oa:hasBodyoa:hasTarget

sem:Place

oi:Afsluitdijk

sem:Actor

dive:Person

KB:Lely

dive:isRelatedTo

dive:relatedPlace

sem:hasPlace

sem:Place

dive:Place

kb:DenHaag1

dive:depictedBy

sem:Event

oi:Opening_afsl

uitdijk

dive:isRelatedTo

dive:relatedActor

sem:hasActor

skos:Concept

gtaa:lelyskos:Concept

gtaa:DenHaag

skos:Concept

gtaa:Zuid-Holland

skos:broader

KB data

GTAA

OI data

Recommended