Click here to load reader
Upload
europeana-newspapers
View
967
Download
0
Embed Size (px)
Citation preview
Europeana Newspapers
Munich Workshop
WP5 Metadata – Structural Metadata
Munich, 26th June 2013
Günter Mühlberger, Innsbruck University
WP5 leader
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Problem statement
• Europeana Newspapers• 15 libraries from several European countries• 10 mill. of newspaper pages for refinement (OCR, OLR)• Need to be delivered to Europeana
• Approach• Currently no standard format available• Unify the delivery format• Create a METS/ALTO Profile• Create tools in order to ease creation of ENMAP objects
2
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
ENMAP
• Implementation• More than 3 mill. pages already processed• Workflow is fully scalable, up to 100.000 pages can be processed
per day (OCR and ENMAP creation)
• Public release• ENMAP (Europeana Newspaper Mets Alto Profile) available to the
public• Planned for October 2013• Accompanying information• Examples• Feedback is highly welcome• Final release is planned for 2014
3
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Structural Metadata
• Structural elements• Title section, headline, advertisement, illustration, caption, running
title (column title), page number, continuation note, imprint, etc.
• Text types (genres)• breaking news, short news, book review, theatre review, obituary,
family notice, job announcement, weather forecast, novel, poem,...
4
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Rationale
• Why do we need these data?• Increase granularity and information• Improve search services (facetted search)• Support crowd based services (apply these metadata) • Instruct service providers
• Other standards in the field?• TEI (Text Encoding Initiative) provides a first starting point but
objectives are different (edition vs. library use)• Best practise models of other libraries
5
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
ENMAP Structural Map
• Objectives• Contribute to some standardisation in this field• Set up a list of these elements• Gather feedback from libraries• Provide definitions and examples• Include a first version within ENMAP
6
Thank you for your attention!
lGünter Mühlberger <[email protected]>