17
Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green Curation in the Cloud, London, 7/8 March 2012

Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

  • Upload
    caesar

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green. Curation in the Cloud, London, 7/8 March 2012. Institutional repository background. Hull has been running a Fedora-based institutional repository for several years Originally based on Fedora + Muradora UI - PowerPoint PPT Presentation

Citation preview

Page 1: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

Curation in the Cloud

Hull’s Fedora and Hydra perspective

Richard Green

Curation in the Cloud, London, 7/8 March 2012

Page 2: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

Institutional repository background

• Hull has been running a Fedora-based institutional repository for several years

– Originally based on Fedora + Muradora UI– More recently (6 months) based on Fedora + Hydra

• The repository covers a wide range of content – not just OA articles…

Curation in the Cloud | London | 7/8 March 2012 | 2

Page 3: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

Curation in the Cloud | London | 7/8 March 2012 | 3

Page 4: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

Curation in the Cloud | London | 7/8 March 2012 | 4

Wide range of content to deal with

- Exam papers- e-Theses & dissertations (ETDs)- Journal articles- Meeting papers or minutes- Policies or procedures- Dissertations (undergraduate)- Photographs- Presentations- Books- Book chapters- Regulations- Reports- Conference papers or abstracts- Leaning materials- Handbooks

- Internet publications- Newsletter articles- Datasets- Sound- Moving images- Guidance documents- Licences- Posters- Events- Letters- Artwork- Diagrams- Maps- Software- etc (!!!)

Page 5: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

Affiliations

• Hull was instrumental in founding the Fedora UK & Ireland User Group…– 20 or so informal members

Curation in the Cloud | London | 7/8 March 2012 | 5

Acuity UnlimitedBritish Cartoon ArchiveUniversity College Dublin (Irish Virtual Research Library and Archive)University of DurhamUniversity of Essex (UK Data Archive)Freshwater Biological AssociationGlasgow Caledonian University (Spoken Word Services)University of HullKing's College London (CeRch)University of Leeds (Timescapes Project)

London School of Economics and Political ScienceUniversity of ManchesterNational e-Science Centre, EdinburghNational Library of ScotlandNational Library of WalesOpen UniversityUniversity of Oxford LibrariesUniversity of Oxford (Forced Migration Project)University of St AndrewsUniversity of York

Page 6: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

Affiliations [2]

• and is a founder member of the Hydra partnership (with the University of Virginia, Stanford University and Fedora Commons)– Fedora does not have an ‘out-of-the-box’ UI. Hydra set out to

provide building blocks from which highly functional (full-CRUD) UIs could be built over it

– Growing number of Hydra-using institutions in the US, two or three so far in the UK

– Hydra “content modelling” is proving useful to non-Hydra Fedora users

Curation in the Cloud | London | 7/8 March 2012 | 6

Page 7: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

At the moment?

• Just starting to think seriously about opportunities in the cloud– This meeting is opportune to help clarify what is still somewhat

fuzzy thinking

• At the moment, we in Hull are considering the use of cloud storage in addition to local storage for its Hydra repository

Curation in the Cloud | London | 7/8 March 2012 | 7

Page 8: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

At the moment? [2]

• Why the cloud?– Could be used to provide near-line capability for rarely used

assets which are individually ‘small’ but numerous

– Potential to store very large, but rarely accessed, assets (TB range) ‘cheaply’ (cf high-performance SAN storage)

– Possibility of leveraging ‘above campus’ services (Image manipulation? Video streaming? Format migration?)

Curation in the Cloud | London | 7/8 March 2012 | 8

Page 9: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

At the moment? [3]

• WE’RE NOT – considering a complete repository infrastructure in the cloud

• Happier with the software stack locally

– considering local software with all-cloud storage• There are known problems with latency etc

• WE ARE– considering a hybrid of the two

Curation in the Cloud | London | 7/8 March 2012 | 9

Page 10: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

At the moment? [4]

• How?– In principle, Fedora (and therefore Hydra) allows for a mix and

match of storage: Fedora managed (local file system), external (http accessible), redirected (redirects user to appropriate URL)

– So: • use “managed content” for straightforward, small and/or high access

materials;

• use “external content” for low access materials or where there is a value-added service.

Curation in the Cloud | London | 7/8 March 2012 | 10

Page 11: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

Scale of problem

• Bulk of repository content is “small” – megabytes

• Multimedia content is larger (10s-100s megabytes) and our current offering is “download” – we cannot (yet) stream

• We know there are multi-TB datasets on campus to be dealt with– eg Biology have one 6TB growing at 2TB per quarter

Curation in the Cloud | London | 7/8 March 2012 | 11

Page 12: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

Potential practical problems

• High-access materials could generate large download charges– Better suited to low access objects or to get ‘value added’

services– Need a way of predicting costs over long periods (using the

LIFE model?)

• Getting large objects/volumes into the cloud– Transfer times for TBs of content are considerable. Use UPS to

send a hard drive (or several?)

Curation in the Cloud | London | 7/8 March 2012 | 12

Page 13: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

Potential practical problems [2]

• Security– Hull’s IR has very granular security (categories

[public/staff/student], groups [eg student modules], individuals)

– Need to be able to restrict access to cloud-based materials accordingly

Curation in the Cloud | London | 7/8 March 2012 | 13

Page 14: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

Potential practical problems [3]

• Durability– “Designed to provide 99.999999999% durability” (Amazon S3

SLA). And the other 0.000000001%? Not a lot, but…• Could that mean for every terabyte you send us we promise not to

corrupt more than ten or so bytes?!?

• Or that we might lose 1 in 1011 files, which might not be quite so bad providing it’s not one of your files

– LOCKSS type approach across several providers?

Curation in the Cloud | London | 7/8 March 2012 | 14

Page 15: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

Potential Practical Problems [4]

• Management of an institutional cloud– Can an institution realistically manage its own cloud space(s)?

• Managing just the data• Maybe managing cloud-based services

– Is the idea of third-party management (à la DuraSpace) a more appropriate model?

Curation in the Cloud | London | 7/8 March 2012 | 15

Page 16: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

So, in summary…

• Hull is potentially interested in cloud solutions for:– Low access materials which individually are not big but taken

together are (eg 000s of images)– TB+, low-access objects– ‘Above campus’, value-added services (Image manipulation,

media streaming, format migration, LOCKSS-in-the-Cloud?)

• Maybe sounds like a job for a UK HE oriented, brokered service akin to DuraCloud’s model?

Curation in the Cloud | London | 7/8 March 2012 | 16

Page 17: Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

Curation in the Cloud | London | 7/8 March 2012 | 17

Contacts and links

IR Service owner: Chris Awre ([email protected])

Hydra Project Manager for Hull: Richard Green ([email protected])

Hull Institutional Repository: hydra.hull.ac.uk

Fedora website: fedora-commons.org

Hydra website: projecthydra.org

Fedora UK&I User group: fedora-uki.org.uk