If you can't read please download the document
Upload
beata
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Research Data Management www.globusonline.org. Rachana Ananthakrishnan University of Chicago & Argonne National Lab. We started with technology proven in many large-scale grids. GridFTP GRAM MyProxy GSI- OpenSSH. …. - PowerPoint PPT Presentation
Citation preview
Science for the Future: Strategies for distributing and sharing data
Research Data Management
www.globusonline.orgRachana AnanthakrishnanUniversity of Chicago & Argonne National Lab
globus online1We started with technology proven in many large-scale grids
GridFTPGRAMMyProxy GSI-OpenSSHBig science has achieved big successes with advanced community services
Community services built on Globus Toolkit software
LIGO: 1 PB data in last science run, distributed worldwideESG: 1.2 PB climate datadelivered to 23,000 users; 600+ pubs
OSG: 1.4M CPU-hours/day, >90 sites, >3000 users, >260 pubs in 2010Substantial teamsSustained effortLeverage common technologyApplication-specific solutionsProduction focus3But small and medium science is suffering
Data delugeAd-hoc solutionsInadequate software, hardware & IT staffDES as an example of medium science4Every night, they receive 100,000 files in IllinoisThey transmit files to Texas for analysis then move results back to Illinois and make them available to usersProcess must be reliable, routine, and efficientThe cyberinfrastructure team is not large!Medium science: Dark Energy Survey
Image credit: Roger Smith/NOAO/AURA/NSFBlanco 4m on Cerro TololoNot just small labsmedium science too.E.g., Dark Energy Survey.5Time-consuming Tasks in ResearchRun experimentsCollect dataManage dataMove dataAcquire computersAnalyze dataRun simulationsCompare experiment with simulationSearch the literatureCommunicate with colleaguesPublish papersFind, configure, install relevant softwareFind, access, analyze relevant dataOrder suppliesWrite proposalsWrite reports66Excerpts from ESNet reportsTransfers often take longer than expected based on available network capacities Lack of an easy to use interface to some of the high-performance tools Tools [are] too difficult to install and useTime and interruption to other work required to supervise large data transfers Need data transfer tools that are easy to use, well-supported, and permitted by site and facility cybersecurity organizationsWe envisage a world where data flows rapidly, reliably, and securely among: experimental facilities, online and archival storage, computing facilities, and remote institutionsWe envisage a world where data is easily integrated into dynamic datasets that also include metadata and programs necessary to understand and regenerate itWe envisage a world where data is readily discoverable and accessible to collaborators, regardless of their and the datas location
We believe a new approach is needed to deliver data management infrastructure
FrictionlessAffordableSustainable
Like but for science! Focusing on frictionless, weve started to do this with the Globus Online service Transfer and sharing of large data sets with dropbox-like characteristics directly from your own storage systemsReliable, secure, high-performance file transferFire-and-forget transfersAutomatic fault recoveryAuto tuningSeamless security integrationDataSourceDataDestination
User initiates transfer request1Globus Online moves and syncs files2Globus Online notifies user3
13Simple, secure sharing off existing storage systems DataSource
User A selects file(s) to share, selects user or group, and sets permissions 1Globus Online tracks shared files; no need to move files to cloud storage!2User B logs in to Globus Online and accesses shared file3
Easily share large data with any user or groupNo cloud storage required14Globus Online is SaaSWeb, command line, and REST interfacesReduced IT operational costsNew features automatically availableConsolidated support & troubleshootingEasy to add your laptop, server, cluster, supercomputer, etc. with Globus Connect 15Globus Connect MultiuserCreate endpoint in minutes; no complex GridFTP installEnable all users with local accounts to transfer filesNative packages: RPMs and DEBsAlso available as part of the Globus Toolkit16Local Storage System(RCC cluster, campus server, )Globus Connect MultiuserMyProxyOnline CAGridFTP Server
Local system users
Early adoption is encouraging
Early adoption is encouraging
~24PB and 1B files moved
10x (or better) performance vs. scp
99.9% availability
B. Winjum (UCLA) moves 900K-file plasma physics datasets UCLA NERSC
20Dan Kozak (Caltech) replicates 1 PB LIGO astronomy data for resilience
2122
Credit: Kerstin Kleese-van DamErin Miller (PNNL) collects data at Advanced Photon Source, renders at PNNL, and views at ANLThis image shows a 3D rendering of a Shewanella biofilm grown on a flat plastic substrate in a Constant Depth bioFilm Fermenter (CDFF). The image was generated using x-ray microtomography at the Advanced Photon Source, Argonne National Laboratory.
22Globus Online as a platformGlobus Nexus (Identity, Group, Profile)Sharing ServiceTransfer ServiceDataset Services Globus Toolkit
Globus Online APIs
Globus Connect
Early platform adopters
More capabilities underway Globus Toolkit
Sharing Service
Transfer Service
Dataset Services
Globus Nexus (Identity, Group, Profile)
Globus Online APIs
Globus Connect
Introducing the datasetGroup data based on use, not locationLogical grouping to organize, reorganize, search, and describe usageTag with characteristics that reflect content Capture as much existing information as we canor to reflect current status in investigationStage of processing, provenance, validation, ..Share data sets for collaborationControl access to data and metadataOperate on datasets as unitsCopy, export, analyze, tag, archive, Expanding Globus Online servicesIngest and publicationImagine a DropBox that not only replicates, but also extracts metadata, catalogs, convertsCatalogingVirtual views of data based on user-defined and/or automatically extracted metadataIntegration with computationAssociate computational procedures, orchestrate application, catalog results, record provenance
28mydata42owner: Francescotype: 3dtomoformat: HDF5beamline: 2BM
Tomography
Define datasetInfer typeExtract metadataPopulate catalog(s)Locate datasetsAccess filesanalyze
Catalog derived products
transfer/scheduleOrchestrationOrganizationRecord provenance
Annotate, sharebrowse, search
http://www.blyberg.net/card-generator/http://www.sciencemag.org/content/332/6025/88/F1.large.jpg
28We believe a new approach is needed to deliver data management infrastructure
FrictionlessAffordableSustainableWeve got a handle on frictionlessWeb interface, REST API, command lineInCommon, Oauth, OpenID, X.509, Credential managementGroup definition and managementTransfer management and optimizationReliability via transfer retriesOne-click Globus Connect install 5-minute Globus Connect Multiuser installAffordable and sustainable?Common expectation is either:High-priced commercial software (with generally higher levels of quality)Or:Free, open source software (with generally lower levels of quality)
We aim to offer the best of all worlds!We are a non-profit service provider to the non-profit research communityOur challenge:SustainabilityWe are a non-profit service provider to the non-profit research communityGlobus Online Provider Plans
Support ongoing operationsOffer value-added capabilitiesEngage more closely with users34Provider Plans offerEndpoint management consoleUsage reportingMSS optimizationsGlobus Plus subscriptionsBranded web sitesAlternate identity provider
Starting at $10k/year35Researchers may use Globus file transfer for freeFile transfer and synchronization to/from serversPersonal endpoints with Globus ConnectAccess to shared endpoints created by others
Globus Plus: $7/month (or $70/year)Create and manage shared endpointsTransfer and sharing between Globus Connect Personal endpoints3636We hope you will join us
Provider Plan not required to get startedUse Globus Connect Multiuser to easily connect your resources with Globus OnlineGo to: globusonline.org/gcmuRegistryStaging StoreIngestStoreAnalysisStoreCommunity StoreArchiveMirror
IngestStoreAnalysisStoreCommunity StoreArchiveMirrorRegistry
38Our research is supported by:
U.S. DEPARTMENT OFENERGY
QuestionsContact: [email protected]: globusonline.org/provider-plansResearchers: globusonline.org/pluswww.globusonline.org