31
Presentations • Introduction • Case Studies: – Policies, Services, Interoperability, Mashups: • BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects: • NARA TPAP, RENCI VO, TIP – Interfaces: • Islandora, Jargon, CDR

Presentations

Embed Size (px)

DESCRIPTION

Presentations. Introduction Case Studies: Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy RENCI Federated Data Projects: NARA TPAP, RENCI VO, TIP Interfaces: Islandora, Jargon, CDR. iRODS federates major collections From Ken Arnold, SHAMAN project. A Unified - PowerPoint PPT Presentation

Citation preview

Presentations

• Introduction• Case Studies:

– Policies, Services, Interoperability, Mashups:• BNF, DCAPE, PoDRI, e-Legacy

– RENCI Federated Data Projects:• NARA TPAP, RENCI VO, TIP

– Interfaces:• Islandora, Jargon, CDR

A UnifiedWeb interface for

Browsing or searching

Flickr file system/flickr/commons/Using flickr API, a RESTful web API

Each /flickr/commons/Institution “folder” translates to the result of one or two calls to the flickr API, presented to iRODS as if it were a file system

For a collection to integrate, it would need to have some remote API that we could write a driver for and one or more ways to map that collection into a tree

Each mountable service is made into a resource with all relevant info (location, resource type, etc.

iRODS federates major collectionsFrom Ken Arnold, SHAMAN project

YouTubeMedia accessible

through API

User Sees Single Hierarchy

New ServiceMountable file system: Hulu, photobucket, etc.

UserWith Client Views & Manages Data

My DataDisk, Tape, Database,

Filesystem, etc.

The iRODS Data System can install in a “layer” over existing or new data, letting you view, manage, and share part or all of diverse data in a unified Collection.

iRODS Shows Unified “Virtual Collection”

My DataDisk, Tape, Database,

Filesystem, etc.

User Sees Single “Virtual Collection”

Partner’s DataRemote Disk, Tape,

Filesystem, etc.

UserWith iRODS Client

searches CATALOG to find and get Data

Users can search for, access, add/extract metadata, annotate, analyze & process, replicate, copy, share data, manage & track access, subscribe, and more.

Accessing Data in the iRODS System

“Gets data to user.”

“I need data!”

“Finds the data.”

Data ServerDisk, Tape, Database,

Filesystem, etc.

iRODS MetadataCatalog

Keeps track of data

iRODS Data System

User InterfaceWeb or GUI Client to

Access and Manage Data & Metadata*

Overview of iRODS Components

iRODS ServerData on Disk

iRODS MetadataCatalog

DatabaseTracks state of data

iRODS Rule Engine

Implements Policies

*Access data with: Web-based Browser, iRODS GUI, Command Line clients, Dspace, Fedora, Kepler workflow, WebDAV, user level file system, etc.

CommunityDecides how to manage shared

Collection(s)

"Layers" in iRODS: From Users to Storage

PoliciesExpress goals for data

access, sharing, preservation, etc.

PoliciesExpress goals for data

access, sharing, preservation, etc.

RulesImplement Policies in

computer-actionable form

RulesImplement Policies in

computer-actionable form

iRODS Server Executes Micro-

servicesMicro-servicesOperate on reomte data

Micro-servicesOperate on reomte data

Under the hood - a glimpse

iRODS ServerRule Engine

• Data request goes to 1st Server

iRODS ServerRule Engine

iRODS Server Rule Engine

DB

• Server looks up information in catalog• Catalog tells 2nd federated server has data• 1st server asks 2nd server for data• 2nd server applies Rules and serves data

• User asks for data (using logical properties)

Meta DataCatalog

NC State Duke Chapel Hill

Policies in iRODS • Policies: Express community goals for data access and sharing,

management, long-term preservation, uses, etc. • Policy Examples

– Run a particular workflow when a “set of files” is ingested into a collection (e.g. make thumbnails of images, post to website).

– Automatically replicate a file added to a collection into 3 geographically distributed sites.

– Automatically extract metadata for a file of a certain type and store in metadata catalog.

– Periodically check integrity of files in a Collection and repair/replace if needed/possible.

– Automatically pick a certain storage location based on user or collection or size or type.

– Let a user access a collection only if using certificate-based login.– Send a notification when a certain file is ingested.– etc.

Policies, Services, Interoperability, Mashups:

Richard Marciano, SILS

e-Legacy Mashup

RSSRSSFeed

ReaderFeed

ReaderData Grid(SRB/iRODS)Data Grid(SRB/iRODS)

AppraisalAppraisal

Description Arrangement Preservation

Description Arrangement Preservation

e-Legacy Demo

Subscribe to RSS

Subscribe to RSS

Review Received Entry

Review Received Entry

Share and Tag Share

and Tag

MeetPreservation

Criteria

MeetPreservation

Criteria

Preserve toiRODS

Preserve toiRODS

YesYes

National Library of France:Distributed Archiving & Preservation System (SPAR)

BNF: French National Library• Three rules:

– Import• Import an input document into iRODS• Add import date and checksum as AVU-triplet metadata• Replicate to other resources

– Get• Locate a copy of the record• Return if physical checksum .eq. stored checksum• If not, delete replica, copy a good one over it

– Audit• Locate all replicas of a data object• Compute a physical checksum using system’s MD5• Compare the result of the checksum stored in user metadata• All stale copies are removed and then replicated from another good copy• When all copies are audited, a clean copy is staged onto a specific FS directory

BNF: French National Library• Three rules:

– Import• Import an input document into iRODS• Add import date and checksum as AVU-triplet metadata• Replicate to other resources

– Get• Locate a copy of the record• Return if physical checksum .eq. stored checksum• If not, delete replica, copy a good one over it

– Audit• Locate all replicas of a data object• Compute a physical checksum using system’s MD5• Compare the result of the checksum stored in user metadata• All stale copies are removed and then replicated from another good copy• When all copies are audited, a clean copy is staged onto a specific FS directory

BNF: French National Library• Micro-Services

– Add metadata to an iRODS object– Import an object into iRODS, compute MD5 checksum and validate

against the supplied one. Once validated, add MD5SUM and import date as metadata. If invalid, content is removed from iRODS

– Return the value of an iRODS object metadata attribute– Prepare to retrieve a metadata attribute for a resource– Prepare to retrieve a metadata attribute for an object– Get the input resources belonging to a zone name– Get iCAT results regarding location info for a record– Execute MD5SUM on the physical content and return value– Return a pseudo random string of specified length– Delete a stale replica and replicate over it from another fresh copy– Stale replica replacement can be eager (synchronous execution) or

lazy (delayed execution)

DCAPE

DCAPE

DCAPE

PoDRI: Policy-Driven Repository Interoperability

RENCI Federated Data Projects

Leesa Brieger, RENCI

RENCI VO Data Grid

iRODS Server Metadata Catalog (iCAT)

DB RENCI, Europa Center

iRODS ServeriRODS Server

UNC-A UNC-CH

NCSU Duke

iRODS ServeriRODS Server iRODS Server

• Client asks for data

• Data request goes to iRODS server

• Server looks up information in iCAT

• iCAT tells which iRODS server has data

• Data is retrieved from physical location and delivered to client

• Client asks for data

• Data request goes to iRODS server

• Server looks up information in iCAT

• iCAT tells which iRODS server has data

• Data is retrieved from physical location and delivered to client

ECU

National Archives and Records Administration Transcontinental Persistent Archive Prototype (TPAP)

UMD UCSD

iCAT iCAT

Georgia Tech

iCAT

Federation of Seven Independent Data Grids

NARA II

iCAT

NARA I

iCAT

• Extensible Environment: can federate with additional research and education sites.

• Each data grid uses different vendor products.

Rocket Center UNC

iCATiCAT

Federated Repositories

TUCASI Infrastructure Project (TIP)

TUCASI Infrastructure Project (TIP)

• Leverage data resources for competitive research and leadership• Support research and education efforts in a wide range of disciplines and

domains• National leadership in next-generation data management

• Model for long term campus storage• Architecture and design; hardware, software• Operations and support• Data policies

Selection and retention Ingest, curation and preservation Collections and repository management

Goals

A TestClassroom content on a DICE/RENCI

data grid

Panopto Elluminate

Interfaces Jargon, Web, REST, SOAP

Mike Conway, DICE CenterJargon, Java, Interface Developer

GoalsMake integration simple by creating clear, familiar service API.Make IRODS a familiar, easy-to-use resource to mid-tier Java developers.Develop a REST/SOAP service model for common use-cases using mature tools.Create an out-of-the-box web interface that makes IRODS easy for administrators and archivists.

Currently...•Jargon is a pure-Java API that talks to IRODS over Java sockets.

•Jargon is fairly low-level and can be tricky at first.

•Used in multiple projects including WebDAV interface, as well as integration with the Fedora repository via the irodsfedora library.

Jargon (next...)Jargon-core: Jargon re-factored

High level service API, POJO's, Spring-friendly Emphasis on testability

Jargon-akubra: Implementation of an Akubra module for IRODS via Jargon

Jargon-lingo: Application of mature open-source tools over Jargon-core to provide REST-ful, SOAP, and Web interface to IRODS.

Conceptual Diagram

IRODS Grid

Jargon-core

Jargon-lingo Jargon-akubra

Custom code(Java, Groovy,

JythonJruby, etc.)

DuraSpaceFrameworks

Web

SOAP/REST

IRODSServiceModel

TRLN Partners QuestionnaireNC StateJim Tuttle

DukeSeth Shaw

DukeWinston Atkins

DukeRussell Koonts

UNCWill Owen

1. Preservation Projects

• Geo NDIIPP• Images• e-Theses• Dissertations

• records • TRAC• 30 criteria

• Fedora iRODS• checksum• 2 copies

• CDR

2. Status • Planned • planned• production

• ½ way • testing phase • near production

3. Preservation Challenges

• permission• auditing• replication

• search/browse • version control

• policies• tiered storage

• getting the backlog

• generating meta.• consolidating meta.• prez. planning• sys. reliability

4. iRODS • no • no • no • yes • yes

5. iRODS Challenges

• NA • NA • NA • none • rules syntax• documentation• production configuration• stable release

6. Questions None None None • working w. archivists• maintenance releases• iRODS book