Upload
phil-cryer
View
1.398
Download
2
Embed Size (px)
DESCRIPTION
And update on Biodiversity Heritage Library's efforts and success in 2010 with a focus on the future as part of the EU project, ViBRANT.
Citation preview
Phil Cryer Biodiversity Heritage Library
Scripting Life: the science behind ViBRANTJanuary 20-21, 2011 - Paris, France
Phil Cryer Biodiversity Heritage Library
Scripting Life: the science behind ViBRANTJanuary 20-21, 2011 - Paris, France
Biodiversity Heritage Library: Process & Progress
• a consortium of global partners
• aims to share historic biodiversity literature texts
• provides open access of all content
• free for all
Biodiversity Heritage Library (BHL)
bhl data statsbhl data stats
Content•45,000 journals & monographs–8,821 in 2010
•87,000 volumes–15,552 in 2010
•32 million pages–5.6 million in 2010
Usage (2010)•837,000 visits
•422,000 unique visitors
•4.2 millions page views
•221 countries/territories
new featuresnew features
scanning request form
click on ‘Feedback’ to access
new user interface for names index
sortable columns, exportable via CSV, BibTeX and Endnote
downloadable article PDFs
create articles from BHL books
downloadable article PDFs
1- enter metadata about the article
downloadable article PDFs
2- select the pages of the article
downloadable article PDFs
3- PDF request received
downloadable article PDFs
4- PDF article arrives via email
CiteBank (http://citebank.org)
open access repository for biodiversity publications
CiteBank features
• access the ‘crowd-sourced’ articles generated from the BHL scans (harvested from BHL)
• platform for journals/publishers/societies in need of tools to store and share content
• harvests metadata from Zookeys, SCiELO, Smithsonian collections nightly via OAI-PMH
• new search index to BHL content using Solr
CiteBank + BHL expands our core features
• content and tools for scholarly crowd-sourcing
– Users can get content they need, do minor work, share enhancements with community
• look to add more content integration with other existing platforms
– EOL, Atlas of Living Australia, JSTOR Plant Science, BioStor and others
– Mendeley, Zotero, RefWorks, etc
• enhancements to the portal home page
– More focus on search
• special collections
– Charles Darwin’s scientific library
• scholarly annotations
– annotations in Darwin’s hand and academic interpretation, crosslinking
More BHL features coming soon...
bhl globalbhl global
Benefits of Global BHL partnerships
• redundancy and resilience
– data and app Mirroring
• exposing unique content
• new tools, services, people
• opportunities for new collaborations
– IMPACT, ViBRANT, OpenUp! in EU
storage clustersstorage clusters
• all BHL data stored at the Internet Archive in San Francisco
– no redundancy– limited in how we could serve our
data and images
– difficult to analyze data
• First global BHL cluster gives us– redundancy and failover– many new serving options
– new ways to run analytics, data mining
Storage issues solved using clusters
• open source software– Linux operating system– Gluster distributed storage system
• commodity hardware– Supermicro servers– ‘off the shelf’ hard drives and other
system components
Open source software / commodity hardware
• BHL Cluster 01– six 4U sized cabinets– twenty-four 1.5TB hard
drives in each cabinet
– 97TB of replicated and distributed storage (over 200TB of raw disk)
BHL Cluster 01
Statistical computing
• find relationships– R GNU statistical language– Hadoop, Disco
• make existing data more useful– image and OCR reprocessing,
taxonfinder
data sharingdata sharing
• replicating BHL data globally– Marine Biological Laboratory (Woods Hole,
US)– National History Museum (London, UK)
– Bibliotheca Alexandrina (Alexandrina, EG)
– Atlas of Living Australia (Canberra, AU)
– China... Brazil...
• advantages to replication
– redundancy, failover– load balancing
– geographical distribution
Data sharing and replication
• grabby– handles initial download from Internet Archive (IA)
• bhl-sync
– open source Dropbox model– handles syncing remote nodes automatically
– uses inotify, lsyncd, OpenSSH, rsync, unison
– remote server only requires a secure login
Open source code available at http://bit.ly/bhl-bits
Software for data sync
• digital repository platform– enables storage and management of digital content– maintains a persistent digital archive
– stores data in a neutral manner
– provides backup, redundancy, disaster recovery
• shares data to remote nodes via OAI-PMH
Fedora-commons integration
future plansfuture plans
• BHL is a member of CrossRef through The Smithsonian
• will start assigning DOIs to BHL monographs
– easy, non-controversial provides open access of all content
• then move on to articles and other publication types
– CrossRef rules make full assignment challenging for crowd-sourced articles
Assigning DOIs (Digital Object Identifier)
• OCR Correction
– a big problem, no easy solution
• add more content
– partnerships, CiteBank
• sustainability planning and funding
– committed to no fees for users
• more outreach
– conferences, marketing
– Facebook, Twitter and other social media avenues...
Wish list for 2011 and beyond
http://biodiversitylibrary.blogspot.com
http://twitter.com/BioDivLibrary #bhlib
http://facebook.com/pages/Biodiversity-Heritage-Library/63547246565
http://flickr.com/groups/bhl
http://youtube.com/user/BioHeritageLibrary
http://biodiversitylibrary.org/RecentRss.aspx
http://slidesha.re/bhl-slides
BHL is social!
slides: slidesha.re/bhl-slidescontact: [email protected]
Thanks.
slides: slidesha.re/bhl-slidescontact: [email protected]
Thanks.
Phil Cryer: Biodiversity Heritage Library
Scripting Life: the science behind ViBRANTJanuary 20-21, 2011 - Paris, France
Phil Cryer: Biodiversity Heritage Library
Scripting Life: the science behind ViBRANTJanuary 20-21, 2011 - Paris, France