46
ENRICHING MEDIA COLLECTIONS FOR EVENT-BASED EXPLORATION Victor de Boer, Liliana Melgar, Oana Inel, Carlos Martinez Ortiz, Lora Aroyo, and Johan Oomen MTSR 2017

Enriching Media Collections for Event-based Exploration

Embed Size (px)

Citation preview

Page 1: Enriching Media Collections for Event-based Exploration

ENRICHING MEDIA COLLECTIONS FOR EVENT-BASED EXPLORATIONVictor de Boer, Liliana Melgar, Oana Inel, Carlos Martinez Ortiz, Lora Aroyo, and Johan Oomen

MTSR 2017

Page 2: Enriching Media Collections for Event-based Exploration

2

Cultural Heritage Collections becoming available as Linked Open Data

Page 3: Enriching Media Collections for Event-based Exploration

Support exploratory, event-centric browsing

of multiple, heterogeneous collections for Media Scholars

Page 4: Enriching Media Collections for Event-based Exploration

DIVE+ Case study

Page 5: Enriching Media Collections for Event-based Exploration
Page 6: Enriching Media Collections for Event-based Exploration
Page 7: Enriching Media Collections for Event-based Exploration
Page 8: Enriching Media Collections for Event-based Exploration
Page 9: Enriching Media Collections for Event-based Exploration
Page 10: Enriching Media Collections for Event-based Exploration
Page 11: Enriching Media Collections for Event-based Exploration

OPENIMAGES.EU

3,220 news broadcasts

Netherlands Institute for Sound & Vision

GTAA thesaurus

DELPHER.NL

197,199 Scans of Radio bulletins

1937 – 1984

AMSTERDAM MUSEUM

73,447 cultural heritage objects

AM Thesaurus

TROPENMUSEUM

78,270 cultural heritage objects

SVNC thesaurus

DIVE+ Collections and Vocabularies

Page 12: Enriching Media Collections for Event-based Exploration

Interactive Exploration & Discovery in Contextlinking objects to events and entitiesbuilding automatic storylines (proto-narratives)

Goal: develop explorable Knowledge Graph

Page 13: Enriching Media Collections for Event-based Exploration

Our recipe

Page 14: Enriching Media Collections for Event-based Exploration

Mapping to popular vocabularies

am:obj_22093 “Job Cohen”am:contentPersonName

rdfs:subPropertyOf

dcterms:subject

1. Mapping to generic schema

DIVE+

Page 15: Enriching Media Collections for Event-based Exploration

Van Hage, W. R., Malaisé, V., Segers, R., Hollink, L., & Schreiber, G. (2011). Design

and use of the Simple Event Model (SEM). Web Semantics: Science, Services and

Agents on the World Wide Web, 9(2), 128-136.

Simple Event Model (SEM)

Page 16: Enriching Media Collections for Event-based Exploration

sem:Event

sem:Actordive:MediaObjectdive:depictedBy

rdfs:label

dive:source

dive:placeholder

dc:identifier

dc:description

etc.

oa:Annotation

oa:hasBodyoa:hasTarget

sem:Place

sem:Time

skos:Concept

sem:hasActor,

sem:hasPlace

sem:hasTime

dive:isRelatedTo

skos:broader,

skos:narrower etc.

dive:isRelatedTo

DIVE+ Generic data model

Page 17: Enriching Media Collections for Event-based Exploration

DIVE+ manually created RDFS mapping files

# mapping triples

OI 3

NB - (conversion in project)

AM 12

TM 18

Page 18: Enriching Media Collections for Event-based Exploration

ENTITY EXTRACTION

EVENTS

LINKING EVENTS AND CONCEPTS

2. Enrichment: Hybrid strategy

Page 19: Enriching Media Collections for Event-based Exploration

Original Metadata

Interpretation of content

Named Entity Recognition

Human computation

Hybrid pipeline

Where do we get events from?

- LIDO, CIDOC, EDM

- creationDateStart- - Interpretation of object

- NLP tools, other pipelines

- - Crowdsourcing- -Nichesourcing,

Page 20: Enriching Media Collections for Event-based Exploration

Original Metadataam:Belgische opstand

am:besnijdenis

am:Beurs de Keyser

am:bevrijding

am:bezoekerscentrum

am:bibliotheken

am:Bijlmerramp

am:Boulevard of Broken Dreams

am:brand

am:brand van het oude stadhuis op de Dam

am:burgeroorlog

am:capitulatie

am:christendom geboorte van Christus

am:christendom kruisiging

am:christendom opstanding van Christus

am:christus aan het kruis

am:Christus schrijft op de grond

am:concert

"Fayence bord”

Page 21: Enriching Media Collections for Event-based Exploration

Crowdsourcing for Events in Texts & Videos

CrowdTruth.org

Page 22: Enriching Media Collections for Event-based Exploration

Description EventFoto is genomen tijdens de Eerste Zuid Nieuw-Guinea Expeditie

Eerste Zuid Nieuw-Guinea Expeditie

"Foto is genomen tijdens de Eerste- of de Tweede Zuid Nieuw-Guinea Expeditie"

Tweede Zuid Nieuw-Guinea Expeditie

"Masker gedragen tijdens oogstfeesten. Het feest in kwestie is het Sokari spel dat eenmaal per jaar wordt opgevoerd gedurende zeven opeenvolgende nachten na Nieuwjaar, medio april. …” Nieuwjaar

FROG NLP toolkit NER Event extraction

Victor Kramer

https://languagemachines.github.io/frog/

Page 23: Enriching Media Collections for Event-based Exploration

Radio news bulletins: Every object 1 event

Page 24: Enriching Media Collections for Event-based Exploration

Establish explorable links through shared vocabularies

DIVE:MEDIA OBJECT SEM:EVENT

SEM:PLACE

SEM:TIME

SEM:ACTOR

SKOS:CONCEPT

OA:ANNOTATION

PLACE

ACTOR

SKOS:EXACTMATCH

Page 25: Enriching Media Collections for Event-based Exploration

http://cultuurlink.beeldengeluid.nl/

Interactive vocabulary alignment

Page 26: Enriching Media Collections for Event-based Exploration

DIVE:MediaObjectNieuws uit Indonesië:opheffing van het KNIL

dive:depictedBy

sem:hasTimestamp

sem:EventANP:1950-08-11:50

dive:isRelatedTodive:relatedPlacesem:hasPlace

dive:isRelatedTodive:relatedActorsem:hasActor

dive:isRelatedTodive:relatedPlacesem:hasPlace

sem:Time25 Juli 1950

dive:depictedBy

sem:hasTimestamp

DIVE:MediaObjectMannen bij het huis van Paul Spies

aan de Parapattan 42, Djakarta

dive:depictedBy

dive:depictedBy

dive:depictedBy

DIVE:MediaObjectANP:1950-08-11:50

DIVE:MediaObjectSchaal

sem:Time11 Augustus 1950

sem:Eventontbindingsceremonie

sem:PlaceDjakarta

sem:PlaceIndonesië

Result: Explorable Knowledge graph

sem:Actor“Mohammed Hatta”

Page 27: Enriching Media Collections for Event-based Exploration

DIVE+ Enrichments

Enrichment

method

Media

Objects Actors Places Events Other Alignments

OI Crowd + NER 3,204 1,249 1,412 1,916 185,846 623

NB

Interpreted +

NER 197,200 194,890 54,571 197,200 6,736 6,353

AM

original

thesaurus 73,447 66,966 5,973 148 28,047 6,865

TM

original

thesaurus +

FROG NER 78,226 27,829 3,896 23* 13,269 -

Total 352,077 290,934 65,852 199,264 233,898 -

*) more to come

Page 28: Enriching Media Collections for Event-based Exploration

Subject-Object Property supertype Count

Media Object-Event dive:depictedBy or dive:isRelatedTo 199,233

Event-Actor sem:hasActor 265,677

Event-Place sem:hasPlace 220,726

Event-Concept dive:isRelatedTo 230

DIVE+ path fragments

Page 29: Enriching Media Collections for Event-based Exploration

Cliopatria triple store - 15M triples (for now) - Sparql endpoint

Provenance management at Named Graph level

http://data.dive.beeldengeluid.nl

Page 30: Enriching Media Collections for Event-based Exploration

DIVE+ UI

https://github.com/CLARIAH/grlc

API Layer

Page 31: Enriching Media Collections for Event-based Exploration

DIVE+ UI: INFINITY OF EXPLORATION

/ Support exploration and serendipity // Visual inspection of media objects and entities /

/ Lets user build, save and share Proto-Narratives/

Page 32: Enriching Media Collections for Event-based Exploration

https://youtu.be/FI3MPiU9rjo?t=138

http://diveplus.beeldengeluid.nl

Page 33: Enriching Media Collections for Event-based Exploration
Page 34: Enriching Media Collections for Event-based Exploration

filters

results ordering

Page 35: Enriching Media Collections for Event-based Exploration

filter on media objects

order media

objects by date

Page 36: Enriching Media Collections for Event-based Exploration

filter on events

Page 37: Enriching Media Collections for Event-based Exploration

explore

event

related

entities

Page 38: Enriching Media Collections for Event-based Exploration

explore

event

event

related

entities

Page 39: Enriching Media Collections for Event-based Exploration

place entity

exploration

Page 40: Enriching Media Collections for Event-based Exploration

narrative

Page 41: Enriching Media Collections for Event-based Exploration

bookmarking

Page 42: Enriching Media Collections for Event-based Exploration

/ Generic data model for connecting heterogeneous media collections

/ Various data enrichment strategies to construct explorable event-centric knowledge graphs

/ DIVE+ Case Study

Take home

Page 43: Enriching Media Collections for Event-based Exploration

/ http://diveproject.beeldengeluid.nl / http://diveplus.beeldengeluid.nl

/ [email protected]

DIVE+

Page 44: Enriching Media Collections for Event-based Exploration

DIVE+ team

Page 45: Enriching Media Collections for Event-based Exploration

Current work: (Common) Event thesaurus?

Februaristaking

WOII

Februaristaking

“De oproep 'Staakt!'

voor deelname aan

de februaristaking te

Amsterdam op 25 en

26 februari 1941. “

stakingen

Eduard Hellendoorn

"Joseph Eijl Eduard Hellendoorn

Hermanus Coenradi 13 maart 1941

gefusilleerd Waalsdorpervlakte"

Waalsdorpervlakte

Jessie Both & Didi de hooge

Page 46: Enriching Media Collections for Event-based Exploration

3. Alignments to vocabularies

sem:Event

oi:Opening_afsluitdijk

dive:isRelatedTo

sem:hasActor

sem:Actor

dive:Person

oi:Ingenieur_Lely

dive:isRelatedTo

dive:relatedPlace

sem:hasPlace

dive:MediaObject

dive:Video

oi:9999dive:depictedBy

http://iopenimages.nl/vi

deo1.mpg

dive:MediaObje

ct dive:Image

kb:image2

oa:Annotation

dive:9999ann

oa:hasBodyoa:hasTarget

sem:Place

oi:Afsluitdijk

sem:Actor

dive:Person

KB:Lely

dive:isRelatedTo

dive:relatedPlace

sem:hasPlace

sem:Place

dive:Place

kb:DenHaag1

dive:depictedBy

sem:Event

oi:Opening_afsl

uitdijk

dive:isRelatedTo

dive:relatedActor

sem:hasActor

skos:Concept

gtaa:lelyskos:Concept

gtaa:DenHaag

skos:Concept

gtaa:Zuid-Holland

skos:broader

KB data

GTAA

OI data