Upload
libis
View
161
Download
0
Embed Size (px)
DESCRIPTION
Deze presentatie werd gegeven op 20/12/2013 in het kader van het 30-jarig bestaan van IBW. Presentatie van Roxanne Wyns, Businesss Consultant bij LIBIS.
Citation preview
28-3-2014
1
The Europeana Data ModelA semantic layer on top of Cultural Heritage Objects
20-12-2013 1Roxanne Wyns - IBW
Roxanne Wyns – LIBIS, KU [email protected]
Content
Overview
• Background
• Europeana – From Portal to Platform
• The Europeana Data Model & The Semantic Web
• EDM basic pattern
• Some applications…
• Semantic enrichment @ Europeana
• Some inspiring examples…
20-12-2013Roxanne Wyns - IBW 2
28-3-2014
2
Background
LIBIS
• Library automation service of the KU Leuven
• Support scientific and public organisations in managing their library,
archival and museum collections
• Center of knowledge & expertise
• Expertise in interoperability standards, semantic enrichment and
multilingualism
• European Best Practice research projects (ICT-PSP)
• Member of several standards and enrichment working groups
20-12-2013Roxanne Wyns - IBW 3
Before we move to EDM…
20-12-2013Roxanne Wyns - IBW 4
28-3-2014
3
Europeana
What is Europeana?
• An internet portal that acts as an interface to millions of books,
paintings, films, museum objects and archival records that have been
digitised throughout Europe
• A platform for knowledge exchange that promotes collaboration
between librarians, curators, archivists and the creative industries
• A platform for access and reuse of cultural content by creative
industries, research and education (Europeana API)
20-12-2013Roxanne Wyns - IBW 5
Europeana as a portal
20-12-2013Roxanne Wyns - IBW 6
28-3-2014
4
20-12-2013Roxanne Wyns - IBW 7
www.europeana.eu
From portal… [1]
From portal to platform…
Aggregation by Europeana
• Gathering digital content from cultural organisations
• Cross-collection and cross-sectoral (archives, libraries, museums)
• Bring this together on the web by using standardised file and metadata
formats (ESE)
• To facilitate resource discovery!
• Quantity vs. Quality
20-12-2013Roxanne Wyns - IBW 8
From portal… [2]
28-3-2014
5
A variety of aggregation models
• Project aggregation (ICT-PSP)
– Dark aggregators
– Aggregators + portal
– Domain, thematic, cross-domain
• National or regional aggregation
– Usually with a portal
– Europeana not the only source
– Domain, thematic, cross-domain
• Institutions
20-12-2013Roxanne Wyns - IBW 9
From portal… [3]
The European Library
20-12-2013Roxanne Wyns - IBW 10
From portal… [4]
28-3-2014
6
Delivering content
• Source > [Intermediate] > Target
• Source = in-house or standard
• Intermediate = standard format or adaptation of standard
• Target = ESE
• Protocols, tools and formats
– XML or CSV
– HTTP, FTP or OAI-PMH upload (provided by aggregator)
– Ingestion and mapping tools (provided by aggregator)
– OAI-PMH (Europeana)
– …
20-12-2013Roxanne Wyns - IBW 11
From portal… [5]
From portal… [5]
ESE - Europeana semantic elements
– Represents lowest common denominator for object metadata
– Basically Dublin Core with some extra Europeana elements
– Forces interoperability
– Convert datasets to a “flat” data representation
– Loss of richness of the original data
– Not adequate to accommodate domain specific requirements
– One digital representation per object record
http://pro.europeana.eu/ese-documentation
20-12-2013Roxanne Wyns - IBW 12
28-3-2014
7
Moving towards a platform
����
20-12-2013Roxanne Wyns - IBW 13
ESEXML
EDMRDF
A change of strategy?!
• Europeana Strategic Plan 2011 – 2015
– Aggregate: Europeana as a trusted source for European Cultural Content
– Facilitate: Support the Cultural Heritage sector through knowledge
transfer
– Distribute: Make Heritage available to everyone, everywhere, at every
moment
– Engage: Find new ways for people to participate in culture
• New Renaissance Report
• Focus on open data and reuse of content
– Linked Open Data, using Semantic Web Technologies
– Data Exchange Agreement (DEA)
20-12-2013Roxanne Wyns - IBW 14
… to platform [1]
28-3-2014
8
Transition into 2014 – Why Europeana and not Google?!
• Shift from portal to platform
• Data quality emphasis
• More data providers
• Value creation for contributing partners (aggregators & content
partners)
• Copyright improvement
• Multilingual
• Thematic focus (e.g. Europeana 1914 – 1918)
Europeana Business Plan 2014 (http://pro.europeana.eu/)
20-12-2013Roxanne Wyns - IBW 15
… to platform [2]
20-12-2013Roxanne Wyns - IBW 16
… to platform [3]
� Positioning Europeana in research, education & creative industries (tourism, social media, UGC…)
� CEF funding
www.europeana1914-1918.eu/
28-3-2014
9
Focusing on data quality
• Introduce star rating system
• Rights statement � increase reuse potential
• Decent previews
• Persistent links
• Metadata records in EDM! ���� data interoperability, semantic enrichment, multilingualism…
20-12-2013Roxanne Wyns - IBW 17
… to platform [4]
Delivering content
• National aggregators!
• Source > [Intermediate] > Target
• Source = [in-house] or standard
• Intermediate = EDM extension or application
• Target = EDM RDF
• Protocols, tools and formats
– XML, CSV, API?
– HTTP, FTP or OAI-PMH upload (provided by aggregator)
– Ingestion and mapping tools (provided by aggregator, part of CMS)
– OAI-PMH, SWORD? (Europeana)
– …
20-12-2013Roxanne Wyns - IBW 18
… to platform [5]
28-3-2014
10
Europeana Inside aggregation
20-12-2013Roxanne Wyns - IBW 19
… to platform [5]
http://www.europeana-inside.eu/
20-12-2013Roxanne Wyns - IBW 20
… to platform [7]
Europeana API http://www.europeana.eu/portal/api/console.html
28-3-2014
11
20-12-2013Roxanne Wyns - IBW 21
… to platform [8]
Europeana LOD pilot
http://europeana.ontotext.com/
The Europeana Data Model (EDM)
& The Semantic Web
20-12-2013Roxanne Wyns - IBW 22
28-3-2014
12
EDM & The Semantic Web [1]
Moving towards the Europeana Data Model
• Based on best practices from the different GLAM domains
• Align the data model to the specific community concerns
• Providing different levels of granularity
• Enable the re-use of existing standards
• Providing the possibility to build domain or sector specific application
profiles on EDM
20-12-2013Roxanne Wyns - IBW 23
EDM & The Semantic Web [2]
EDM requirements
• Richer metadata - finer granularity
• Distinguish “provided objects” (painting, book, movie, etc.) from their
digital representations
• Distinguish object from its metadata record
• Allow multiple records for the same object, containing potentially
contradictory statements about it
• Support for objects that are composed of other objects
• Support for contextual resources, including concepts from controlled
vocabularies
20-12-2013Roxanne Wyns - IBW 24
Introduction to the Europeana Data Model (EDM) (http://pro.europeana.eu/)
28-3-2014
13
EDM & The Semantic Web [3]
A semantic layer on top of Cultural Heritage Objects
• Provides more context to the metadata
• Allows the representation of specific relationships
– Similarities between objects
– Relationships
– Representations
– Derivations
Goal: to make data available as Linked Open Data for re-use by external
sources
20-12-2013Roxanne Wyns - IBW 25
20-12-2013Roxanne Wyns - IBW 26
EDM & The Semantic Web [4]
The Semantic Web
Linking Open Data cloud diagram: From Wikimedia Commons ; http://lod-cloud.net/
28-3-2014
14
20-12-2013Roxanne Wyns - IBW 27
EDM & The Semantic Web [5]
The Semantic web or Web 3.0
• Concept introduced by Sir Tim Berners-Lee (W3C)
• Proposed as the solution for the current problems in sharing and retrieving relevant data on the current Web where:
- Content is not well structured, has inexplicit semantics, is not interoperable (HTML, URLs to link)
- Expressive questions cannot be asked by the user
- Multiple data queries, human interpretation and knowledge is needed to retrieve relevant and “complete” results
���� Moving from documents to data
The Semantic Web is an extension of the current Web
• It includes semantic information (context and meaning!) in web pages
• This meaning allows both people and machines to better interpret the
data
• It creates links so that a person or machine can explore the web of
”related” data via these links
• These links are at the heart of the Semantic web and are needed for
integration and reasoning of data on the Web = Linked Data
20-12-2013Roxanne Wyns - IBW 28
EDM & The Semantic Web [6]
28-3-2014
15
Linked Data principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names
3. When someone looks up a URI, provide useful RDF information
4. Include RDF statements that link to other URIs so that they can
discover related things
Tim Berners-Lee 2007 – http://www.w3.org/DesignIssues/LinkedData.html
20-12-2013Roxanne Wyns - IBW 29
EDM & The Semantic Web [7]
EDM & The Semantic Web [8]
Focus shift
• In the web of documents we have HTTP URIs identifying resources
and links between them, but without context:
– What kinds of resources are 'Louvre.html' and 'LaJoconde.jpg'?
– A machine cannot tell, humans can
Towards a Semantic Research Library. Prof. Dr. Stefan Gradmann (KU Leuven)
20-12-2013Roxanne Wyns - IBW 30
28-3-2014
16
Towards a Semantic Research Library. Prof. Dr. Stefan Gradmann (KU Leuven)
EDM & The Semantic Web [9]
• So we add syntax for making statements on the resource using RDF
triples and a schema language (RDFS)
• Extending the Web into the ‘Web of Things’
20-12-2013Roxanne Wyns - IBW 31
EDM & The Semantic Web [10]
EDM basis
• OAI ORE for organizing an object’s metadata and digital
representation(s)
• Dublin Core for descriptive metadata
• SKOS for conceptual vocabulary representation
• CIDOC-CRM for event and relationships between objects
• RDF for Semantic Web representation
20-12-2013Roxanne Wyns - IBW 32
28-3-2014
17
EDM & The Semantic Web [11]
OAI ORE
• Open Archives Initiative Object Reuse & Exchange
• Defines standards for the description and exchange of aggregations of
Web resources
• For combining distributed resources with multiple media types (text,
images, video, data…) � A “bundle” of an object and its digital
representation(s)
20-12-2013Roxanne Wyns - IBW 33
EDM & The Semantic Web [12]
Dublin Core + extra EDM elements
• For the descriptive metadata of a cultural heritage object
• edm:ProvidedCHO is the cultural heritage object which is the subject
of the package of data delivered to Europeana
Properties:
dc:contributor, dc:creator, dc:date, dc:format, dc:identifier, dc:language, dc:publisher, dc:relation,
dc:source, dcterms:alternative, dcterms:extent, dcterms:temporal, dcterms:medium, dcterms:created,
dcterms:provenance, dcterms:issued, dcterms:conformsTo, dcterms:hasFormat, dcterms:isFormatOf,
dcterms:hasVersion, dcterms:isVersionOf, dcterms:hasPart, dcterms:isPartOf, dcterms:isReferencedBy,
dcterms:references, dcterms:isReplacedBy, dcterms:replaces dcterms:isRequiredBy, dcterms:requires,
dcterms:tableOfContents, edm:isNextInSequence, edm:isDerivativeOf, edm:currentLocation…
20-12-2013Roxanne Wyns - IBW 34
28-3-2014
18
Simple Knowledge Organisation System
• Solution for converting a “classic” thesaurus or vocabulary managed into
a semantically interoperable format
• Based on the RDF specification
• Ideal for creating multilingual networks of terminologies
• Structured according to the ISO 25964 norm
• Components
Concepts Documented
URIs Semantically related (BT, NT, RT)
Labelled Concept schemes
20-12-2013Roxanne Wyns - IBW 35
EDM & The Semantic Web [13]
20-12-2013Roxanne Wyns - IBW 36
EDM & The Semantic Web [14]
28-3-2014
19
Vocabularies play an important role in the Semantic Web and Linked Data world
• They are the basic building blocks for linking data
• They help with the interpretation and integration of data between
different datasets
• And so may lead to the discovery of new relationships between
information expressed in a different natural language
20-12-2013Roxanne Wyns - IBW 37
EDM & The Semantic Web [15]
EDM & The Semantic Web [16]
CIDOC – Conceptual Reference Model (ISO 21127)
• A formal domain ontology for cultural heritage information
• Describes the things that the cultural heritage sector deals with and how these things relate to each other
• Expressed as an “object-oriented” schema
• An object is described according to a series of event that took place in
its lifetime
– When
– Where
– Who
– What
20-12-2013Roxanne Wyns - IBW 38
28-3-2014
20
20-12-2013Roxanne Wyns - IBW 39
CIDOC-CRM events
EDM & The Semantic Web [17]
Resource Description Framework (RDF)
• Forms the basis of Semantic web technologies
• Universal language to describe the characteristics of resource on the
web
• Using XML for syntax and URIs for naming
• Makes statements about resources in the form of subject-predicate-object triples
• RDF triples provides a labelled connection using URIs to make it
possible to link data with one another
• In this way a machine is able to find the semantic relations between
data
20-12-2013Roxanne Wyns - IBW 40
EDM & The Semantic Web [18]
28-3-2014
21
• The different parts of a triple are
– Subject – the thing being described
– Predicate – a trait, aspect, or property of the thing, which expresses a
relationship between the subject and object
– Object – the thing that is the value of the predicate (trait, aspect or
property) of the object thing
• So in the statement “Mona Lisa was created by Da Vinci”
– Subject – Mona Lisa (La Joconde)
– Predicate – Created by
– Object – Da Vinci
• In terms of representation:
– Subject – must be a URI
– Predicate – must be a URI
– Object – may be a URI or a constant value or “literal‟
20-12-2013Roxanne Wyns - IBW 41
EDM & The Semantic Web [17]
KU Leuven University
Flanders
is a
Located in
EDM & The Semantic Web [18]
EDM application
20-12-2013Roxanne Wyns - IBW 42
28-3-2014
22
EDM basic pattern
20-12-2013Roxanne Wyns - IBW 43
EDM basic pattern [1]
• A data provider submits to Europeana a “bundle” of an object and its
digital representation(s)
20-12-2013Roxanne Wyns - IBW 44
28-3-2014
23
EDM basic pattern [2]
20-12-2013Roxanne Wyns - IBW 45
Musical Instruments Museums Online (http://www.mimo-db.eu/), Rodolphe Bailly
Using DC as the basis for ProvidedCHO
• Advantages
– Wide spread
– Simple
– Stable
– Cross-domain
• Disadvantages
– Not rich enough
– Lack of structure
– No differentiation between the object itself and its digital surrogate (e.g.
creator, photographer � dc:creator)
– Loss of relationships between different classes of data and events (no
relation between who, where, what, when)
20-12-2013Roxanne Wyns - IBW 46
EDM basic pattern [3]
28-3-2014
24
“Proxies”
• Describing the provided object as seen from the perspective of a
specific provider
• Used for
– Connecting duplicates of cultural heritage object descriptions coming from
different providers, each with its own metadata
– For adding Europeana enrichments about a resource
– Keeping each provider’s metadata distinct
– And keeping Europeana metadata distinct from the providers’ metadata
20-12-2013Roxanne Wyns - IBW 47
EDM basic pattern [4]
20-12-2013Roxanne Wyns - IBW 48
EDM basic pattern [5]
aggregation of DMF
aggregation of Louvre
Introduction to the Europeana Data Model (EDM) (http://pro.europeana.eu/)
28-3-2014
25
20-12-2013Roxanne Wyns - IBW 49
EDM basic pattern [6]
Introduction to the Europeana Data Model (EDM) (http://pro.europeana.eu/)
Hierarchical objects
Let’s have a look at some applications…
20-12-2013Roxanne Wyns - IBW 50
28-3-2014
26
PartagePlus record provided as EDM
20-12-2013Roxanne Wyns - IBW 51
EDM records [1]
Mapping LIDO2EDM
• Respect for the actual specifications of both models in order to ensure
semantic validity of the resulting EDM
• Only a subset of the (core) LIDO elements are mapped
• When value starts with 'http://' or 'https://' it becomes an 'rdf:resource'
in the EDM record, otherwise it is included as a literal
• In addition an EDM property with the preferred label for the concept or
agent in the language of the metadata records as literal is created
20-12-2013Roxanne Wyns - IBW 52
EDM records [2]
28-3-2014
27
Mapping LIDO2EDM
• Qualifying information for agents (dc:creator, dc:contributor), dates
(dc:date), places (dcterms:spatial) is lost
• LIDO-based ingestions would benefit from a full implementation of the
EDM model
20-12-2013Roxanne Wyns - IBW 53
EDM records [3]
20-12-2013Roxanne Wyns - IBW 54
EDM records [4]
28-3-2014
28
Semantic enrichment @ Europeana
Opportunities and pitfalls
20-12-2013Roxanne Wyns - IBW 55
Semantic tagging
• Using the AnnoCultor tool (http://semium.org)
– Interprets values
– Searches for corresponding terms in specialised vocabularies
– Adds links to matching terms (dcterms:spatial = Venise � link to place:
http://sws.geonames.org/3164603/
– Pulls in additional information about
this record (βενετία, velence, венеция,
venice, etc.)
Semantic enrichment @ Europeana [1]
20-12-2013Roxanne Wyns - IBW 56
28-3-2014
29
Semantic enrichment @ Europeana [2]
Enriched elements
• Place enrichment (edm_place:*)
– Subset of GeoNames (www.geonames.org)
– Limited to European geographic locations
– Limiting on prefixes "A", "P.PPL", "S.CSTL", "S.ANS", "S.MNMT", "S.LIBR", "S.HSTS", "S.OPRA", "S.AMTH", "S.TMPL", "T.ISL“ (http://www.geonames.org/statistics/total.html)
– Enrichment limited to EDM fields “dcterms:spatial” and “dc:coverage”
– Enrichment rules: exact matching?
– Result: 5.8M objects enriched, provides multilingual search on places http://europeana.eu/portal/search.html?query=edm_place%3A*
Issues?
– Appear to be limited
– But only places in Europe are enriched
– And only for the geographical coverage EDM elements
20-12-2013Roxanne Wyns - IBW 57
Semantic enrichment @ Europeana [3]
• Concept (topic) enrichment
– Using GEMET thesaurus (http://www.eionet.europa.eu/gemet/)
– 12 concepts removed to avoid linking with homonyms (e.g. Druck)
– Some WWI battles and the two categories “World War I” and “art” from are taken from Dbpedia
– Enrichment limited to EDM fields “dc:subject” and “dc:type”
– Enrichment rules: exact matching?
– Result: 9M objects enriched, http://www.europeana.eu/portal/search.html?query=skos_concept%3A*
Issues?
– Exact matching not limited to the language of the record (Dutch “Tegel” mapped to the Swedish “Tegel”, meaning brick)
– No suitable multilingual concept thesauri for the cultural domain � drawing
– Noise because of metadata quality (dc:type “photo”, “book”, “video”,…)
20-12-2013Roxanne Wyns - IBW 58
28-3-2014
30
Semantic enrichment @ Europeana [4]
20-12-2013Roxanne Wyns - IBW 59
Semantic enrichment @ Europeana [5]
20-12-2013Roxanne Wyns - IBW 60
28-3-2014
31
Semantic enrichment @ Europeana [6]
• Agent (person) enrichment
– Small set of artists (painters) from Dbpedia
– Enrichment limited to EDM fields “dc:creator” and “dc:contributor”
– Enrichment rules: exact matching?
– Result: 136K objects enriched http://www.europeana.eu/portal/search.html?query=edm_agent%3A*
Issues?
– Quality or structure of provided metadata
20-12-2013Roxanne Wyns - IBW 61
Semantic enrichment @ Europeana [7]
• Time period enrichment
– Using Semium time periods vocabulary (http://semium.org/time/)
– Partly automatically generated (3rd quarter of 15th century) / manually generated (Roman empire)
– Enrichment limited to EDM fields:dc:date, dc:coverage, dc:temporal, edm:year
– Enrichment rules: exact matching?
– Result: 13.3M objects enriched
http://www.europeana.eu/portal/search.html?query=edm_timespan%3A*
Issues?
– Some words (qualifiers to dates, e.g. “made”, “printed”…) have removed from fields prior to enrichment, but this is only done for English records
– So again a problem of quality or structure of the provided metadata
– Huge issues with BC dates, but also date ranges (e.g. “1701/1800" is mapped to "1701" only)
20-12-2013Roxanne Wyns - IBW 62
28-3-2014
32
Semantic enrichment @ Europeana [8]
Pitfalls
• General problems
– Not enough suitable multilingual sources for the DCH domain
– Automatic enrichment vs. manual enrichment
– Quality of the metadata
• Possible solutions
– Indexing vs. display elements
– Full implementation of EDM
– Further extension of EDM
– Gather basic vocabularies and existing multilingual terminologies
– Provide a platform for contributing to translations and mapping vocabularies
– Collect lists for certain metadata fields with limited amount of values, such as format, language, country, date-time rages,…
– Create awareness on Europeana enrichment!
20-12-2013Roxanne Wyns - IBW 63
Semantic enrichment @ Europeana [9]
Opportunities
• Multilingual access to over 28 milj. records
• More enriched elements
• Freely available for re-use (DEA)
• Closer to original metadata thanks to EDM
• Data can be contextualized, semantically linked to other data
• Allows for richer semantic query expansion & cross-collection
browsing
20-12-2013Roxanne Wyns - IBW 64
28-3-2014
33
Some inspiring examples…
20-12-2013Roxanne Wyns - IBW 65
Examples [1]
www.thepund.it
20-12-2013Roxanne Wyns - IBW 66
28-3-2014
34
Examples [2]
20-12-2013Roxanne Wyns - IBW 67
Examples [3]
20-12-2013Roxanne Wyns - IBW 68
www.researchspace.org
28-3-2014
35
Examples [4]
20-12-2013Roxanne Wyns - IBW 69
Questions?
Thank you!
Roxanne Wyns – LIBIS, KU Leuven
20-12-2013Roxanne Wyns - IBW 70
28-3-2014
36
Questions?
Thank you!
Roxanne Wyns – LIBIS, KU Leuven
www.libis.be
www.libis.be
Sources
• Europeana portal: http://www.europeana.eu/portal/ ; http://www.europeana.eu/portal/api
• Europeana Professional: http://pro.europeana.eu/– Introduction to the Europeana Data Model (EDM)
– Europeana Data Model (EDM) documentation
– Europeana Buisiness Plan 2014
• SPARQL end-point of data.europeana.eu: http://europeana.ontotext.com/
• Towards a Semantic Research Library: Digital Humanities Research, Europeana and the Linked Data Paradigm, Prof. Dr. Stefan Gradmann (KU Leuven)
• Europeana 1914-1918: http://www.europeana1914-1918.eu/
• DM2E: http://dm2e.eu/
• Pundit: www.thepund.it
• The ResearchSpace: www.researchspace.org
• Poisonous India or the Importance of a Semantic and Multilingual Enrichment Strategy, Marlies Olensky, Juliane Stiller, and Evelyn Dröge
• Semantic enrichment at Europeana – memo, November 4, 2013, Antoine Isaac
• Mapping survey LIDO 2013-10-30, Regine Stein
20-12-2013Roxanne Wyns - IBW 72