30
Medici for Digital Cultural Heritage Libraries George Tsouloupas, PhD The LinkSCEEM Project

Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Medici for Digital Cultural Heritage

LibrariesGeorge Tsouloupas, PhD

The LinkSCEEM Project

Page 2: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Overview of Digital Libraries

● A Digital Library:"An informal definition of a digital library is a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network." Digital Libraries by William Arms

● Another definition:"A digital library is a collection of collections of electronic knowledge resources developed and maintained in order to meet the totality of information needs for a given user population." Classical Digital Library Model

Page 3: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Overview of Digital Libraries1. Content digitization

/acquisition: Initial conversion of content from physical to digital form.

2. The extraction or creation of metadata or indexing information describing the content

3. Storage of digital content and metadata in an appropriate multimedia repository.

4. Client services for the browser, including repository querying and workflow.

5. Content delivery via file transfer or streaming media.

6. Access through a browser or dedicated client.

7. A private or public network.

Page 4: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

The LinkSCEEM WP8/9● WP8: Integration of resources

○ Data-management and Workflows■ Coordination of the provision of HPC, visualization and data

storage resources on a regional scale

■ Managing data stored on the data storage system

■ Implementing the data management middleware software environment at regional partner sites

■ Developing the software infrastructure linking scientific software applications and hardware resources

● Task 9.3 in WP9:○ Aims at optimization of the data management and scientific workflow

application software to be deployed as described in WP8.

Page 5: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Medici in a nutshell

● A multimedia content management system based on:

○ Web 2.0 interfaces

○ Semantic web technologies (RDF)

○ Cloud-based processing and preprocessing

Page 6: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Motivations

● Address research and education data collection and analytic needs○ Manage large collections of heterogeneous data○ Organize data with metadata and provenance information○ Facilitate collaborations and data sharing○ Enable curation and data preservation

● Support community collections of heterogeneous data (documents, images, video, sensor, modeling, etc.)

● Enable automated data extraction, analytic and preprocessing services on local and remote systems

● Provide data preview capabilities specific to different data types

Page 7: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Why Not Flickr or YouTube

● Maybe?○ Web-accessible tools are relatively generic○ Users like not having to manage storage○ Metadata, tagging, linking, etc. are effective means

of organizing information (i.e., no need for “folders”)● Maybe not?

○ No individual or community ownership○ No control of resources○ Inadequate privacy (e.g., for unpublished work)○ Limits on format, volume, throughput, resolution○ No domain-specific processing○ No provenance (everything is a stream of “posts”)

Page 8: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Why Medici?

● Provides a customizable turnkey solution to store, organize, analyze, view, share, and preserve research content

● Supports heterogeneous files○ Single file or directory upload via click-n-drag○ RESTful web service for batch or script based

uploading○ Owner defined copyright and download permissions

● Standards based (RDF) semantic content model● Conforms to open data and metadata standards

○ Supports OPM (Open Provenance Model)○ Tags, comments, ratings

Page 9: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Why Medici?

● Supports customizable automated extraction services○ E.g.: Image pyramid creation, OCR for scanned text,

movie frame extraction (.mpeg), file transformation, etc…

● Leverages proven technologies○ Lucene indexing, MySQL, any command-line tool for

extraction/analytic services● Open source, public APIs

Page 10: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Medici – Semantic Data Repository● Web and Desktop access to a semantic content repository.

● web 2.0 interfaces● Semantic web technologies (RDF) ● cloud-based processing and preprocessing

Page 11: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Client features● Upload / download● Search / browse● Tag / comment● Create collections● Geo-locate data (map view)● Content-type-specific previewing

● e.g., zoomable images (Seadragon), playable movies (jwplayer), rotatable 3D objects (HTML5)

● Define a specific taxonomy● Access statistics, provenance● Citable persistent URLs● Set copyright and license

attributes● View only, prevent download

● Define dataset relationships

Page 12: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

System Architecture

MediciWebapp

ExtractionService

Extension Point

Extension Point

Extraction Service

Extension Point

Extension Point

Extraction Service

Extension Point

Extension Point

……

Tupelo

Filestore

RDBMS or Triple Store

Desktop App Web Interface

Note: The dB, file store and extraction services can reside on a separate systems.

Page 13: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Software Architecture

RDF Store

FileStore

Medici Desktop Client

Medici Web Application Custom Code

Extractors

Tupelo (RDF + Data Abstraction)

ExtractorsExtractorsPreprocessingExtractor

FileStoreFileStoreFileStore

RDF StoreRDF StoreRDF Store

ExternalToolExternal

ToolExternalTools

Medici

ExternalResources

Client

Server

HTTP / REST / URIQAHTTP / Ajax

On-demand execution of external algorithms and tools

Medici Web Server

Page 14: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Uploading files● Drag'n'Drop Upload● 'Regular' Upload● Scripted Upload via RESTful interface

Page 15: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Metadata

● Extracted information● User-specified information● Collections● Tags● Comments● Location● License● Social● Relationships

Page 16: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Web Interface - Metadata

User-Specified Info

Extracted Info

Accessed

Comments

License

Social

Tags

Collections

Location

Page 17: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Automatically Extracted Information

● E.g. EXIF

Page 18: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

User-specified Information

Page 19: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Collections and Tags

Page 20: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Relationships

Page 21: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Relationships

● Describes● Duplicates● Has Derivative● Is derived from● Is described by● Is referenced by● References● Relates to● ...Extensible !

Page 22: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

License - Location - Social

Page 23: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Embedding elements in other websites

Page 24: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Extraction services

● Multiple, extensible pre-processing pipelines● Asynchronous, distributed, triggered by upload

○ Processing selected on basis of file content type (MIME type)

○ Recursive (products can trigger additional extractions)● Used to produce web-viewable previews

○ Image pyramids, audio/video previews, thumbnails, pdf to plain text

● Used for domain-specific pre-processing, e.g.,○ Metadata extraction (e.g., FITS headers, geolocation)○ Feature detection○ Specialized OCR for non-standard textual types (e.g.,

18th-century manuscripts)

Page 25: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Medici Technologies

● Web application○ Google Web Toolkit○ Java Servlets○ Plain Javascript○ Viewers: Flash, Java Applet, HTML, etc.○ Apache Lucene○ Mysql

● Extraction Service○ Eclipse RCP (Java)○ Large collection of external applications

● Desktop Client○ Eclipse RCP (Java)○ Cyberintegrator Workflow Management System

Page 26: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Medici Communities● Cyprus Institute (Digital Cultural Heritage)

○ 3D object archive for artifacts

● Datanet: sead.ncsa.illinois.edu

● Medici-demo.ncsa.illinois.edu○ An open public server, upload requires account

● InvertNet.org○ Digitization of Biological Collections

● Digging into Data○ University of Sheffield, MATRIX Center○ Given a set of images of historical artefacts, discover what salient

characteristics make an artist different from others using computational image analysis

○ Enable statistical learning about individual and collective authorship.

Page 27: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Medici Communities

● Walker Institute – Rule of Law○ Repository of reports, video, satellite images for the Rule-of-Law in

different locations around the world

● 18Connect – OCR of 18th century Manuscripts○ Institute for Computing in Humanities, Arts, and Social Science○ Extraction service to OCR manuscript images using Gamera OCR

toolkit

Page 28: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Community Drivers

● Use cases and requirements driven by● US Office of Naval Research (ONR)● US National Archives and Records Administration

(NARA)● US National Endowment for the Humanities (NEH)● US National Institute of Health (NIH)● US National Science Foundation (NSF)● Institute for Advanced Computing Applications and

Technologies (IACAT)● Seagrant/EPA● EU LinkSCEEM-2 (Cyprus Institute)

Page 29: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Acknowledgements

● Institute for Advanced Computing Applications and Technologies (IACAT)

● UIUC Campus collaboration● NIH - (Image repository)● iChass - (Digging into Data, 18Connect)● NSF - (InVertnet, Datanet:SEAD)● EPA / Seagrant

● Cyprus Institute Collaboration● The LinkSCEEM Project

Page 30: Libraries Digital Cultural Heritage Medici forlinksceem.cyi.ac.cy/.../uploads/MEDICI_and_Digital... · "A digital library is a collection of collections of electronic ... 18Connect

Thanks!