Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Storing, Backing Up and Archiving Data
Jean Aroom, Clinton Heider and Lisa Spiro
Objectives for This Session
● Know options for storing, backing up, sharing and archiving your data.
● Understand best practices for protecting your data.
Data Storage Definition
● The media (optical or magnetic) to which you save your data files and software.
● All storage media are vulnerable to risk and obsolescence.
● Storage media should be evaluated and updated every 2-5 years.
New England Collaborative Data Management Curriculum
Data Storage Considerations
● Location (Internal/External HD, Network, Remote)● Disk size or storage quota● Computing performance● Accessibility
Data Backup Definition
● Allows you to restore your data if original data is lost or damaged due to:○ Hardware or software malfunction○ Environmental disaster (fire, flood)○ Theft○ Unauthorized access
New England Collaborative Data Management Curriculum
3-2-1 Backup Rule
Save 3 copies of your data.
Use 2 types of storage.
Keep 1 remote copy.
3
2
1
Data Backup Considerations
● Location (On-site, off-site)● Procedure (Full, differential, incremental, mirror)● Frequency (Hourly, daily, weekly, monthly)● Retention (Months, years)● Performance
TEST YOUR BACKUP PLAN!
Data Backup Summary
Backup type Backed up Backup time Restore time Storage space
Full/snapshot All data Slowest Fast High
Differential All data since last full Moderate Moderate Moderate
Incremental Only new/ modified files Fast Slowest Lowest
Mirror Only new/ modified files Fastest Fastest Highest
Overview of Data Storage, Backup and Sharing Options at Rice
Options for faculty/ staff: https://kb.rice.edu/page.php?id=70762Options for students: https://kb.rice.edu/page.php?id=65636
Network Storage● storage.rice.edu - U: drive, departmental shares● Research Data Facility (RDF) - larger scale storage for research projects
Backup Options● storage.rice.edu backups/snapshots● Crash Plan for Rice workstations
Data Sharing/Collaboration Tools - Google Drive, Rice Box, Globus Connect
Storage: storage.rice.edu● Location: Networked● Storage quotas
○ Undergraduates: 2 GB○ Graduates, Staff, Faculty: 5 GB○ Colleges, Depts, Centers, Institutes: 40 GB
● Performance - Subject to network● Accessibility
○ NetID folder: Private, not shared○ Groups: Any Rice NetID holder by request
\\storage.rice.edu
Storage: Research Data Facility● Location: On Site (Rice PDC) network data shares● Storage quotas
○ 500GB per researcher○ Additional storage available with cost recovery○ Cost below $100/TB/year, prorated monthly by use
● Performance - Subject to network● Accessibility
○ Based on NetID and ADRICE security groups○ Can be shared to multiple users in a research group
Backup: storage.rice.edu
● Location: On-site● Procedure: Full replication● Frequency: Daily● Retention
○ Personal access: 2 weeks○ Request IT restoration: 6 months
\\storage.rice.edu\?-home\~snapshot
Backup: CrashPlan
● Availability: Rice-owned computers● Cost: $82.56/year/person (up to 4 devices)● Location: Off-site cloud storage● Procedure: Incremental● Frequency: Adjustable up to every minute● Retention: Adjustable up to forever
CrashPlan PROe or crashplan.rice.edu
Sharing: Google Drive
● Unlimited storage for low risk data● Can be used for collaboration within Rice● Integrates nicely with G-suite productivity apps● Files aren’t local and performance is limited● No provisions for retention of orphaned data● Accessibility
○ Login to G-Suite apps with your Rice NetID
Sharing: Rice Box
● Web based file sharing tool similar to Dropbox● Approved by Rice for sharing secure data● Accessibility
○ Rice NetID○ Share folders with Rice colleagues or external
collaborators○ Add emails of external collaborators to a folder
and send invitations
Sharing: Globus Connect
● Widely used service for large data exchange between participating institutions
● Can be used in our HPC environment or from your desktop with Globus Connect Personal
● Accessibility○ Contact CRC to be added to license○ Arrange for access to peer institution end points
Product Use Location Quota Performance Accessibility
Storage S/B Rice Data Center 2-5-40 GB Network NetID
Google Drive S/C Global Cloud Unlimited Internet NetID & External
RDF S/B Rice Data Center 500GB free Network NetID
Rice Box S/C US Cloud Unlimited Internet NetID & External
CrashPlan B Off-site cloud Unlimited Internet Your NetID
Data Security
● Confidential (SSN, CC#, DL#)○ Financial records○ Health records○ Education records
● Sensitive (Birth date, address, emergency contact, EID/SID)
Security Classification
Rice On-Site Most Secure
Rice Cloud Contract Semi-Secure
Low Risk(Public Data)
CampusPress, RDF Google Drive
Moderate Risk(Sensitive Data) RDF Rice Box
High Risk(Confidential Data)
StorageConfluence
Rice BoxCrashPlan
High Risk(Regulated Data) Storage CrashPlan
Data Archiving Definition
● Provides a final version of your data● Stored for the long-term
Data Archiving Considerations
● Location● File formats● Responsibility● Accessibility
Why Archive Your Data with a Data Repository?
● Conform to publisher or funder requirements● Get cited
○ “studies that made data available in a public repository received 9% … more citations than similar studies for which the data was not made available.” (Piowowar & Vision, 2013)
● Promote future research
Data Archiving OptionsPublic Repositories:● Discipline based repository● General data repository (e.g. FigShare)● Rice Digital Scholarship Archive
Private Approaches:● Long-term storage (redundant)
Finding a repositoryConsult lists and directories of data repositories:● Nature, “Recommended Data Repositories”:
https://www.nature.com/sdata/policies/repositories● PLOS Guide:
http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories
● Re3data: http://www.re3data.org/
Share Your Data through A Disciplinary Repository: Pangea
https://doi.pangaea.de/10.1594/PANGAEA.743388
Rice Data Sharing Option: Rice Digital Scholarship Archive
https://scholarship.rice.edu/
How to Set Yourself Up to Archive Your Data● Before sharing, ensure that confidentiality is protected
and that there are no copyright concerns.● Document your data so that others understand it.● Organize your data● Provide the metadata required by the repository● Get your data into the appropriate format (ideally a
non-proprietary format like csv or txt)● Provide metadata● Aim for networked storage rather than device-dependent
Example of submission requirements: PangeaDocumentation--explain abbreviations--provide units for parameters
Metadata:-- position (geographic)--citation of journal article
Format:--excel or tab-delimited text files for tables
Questions to Ask in Evaluating a Data Repository1. How well will the data be preserved? How stable is the
repository?2. What kind of reputation does the archive have in the
community?3. Does the repository facilitate citation of the data?4. Does the repository allow you to describe the data fully
and make it discoverable?5. Are there curators who can help to deposit the data?6. What are the costs of deposit, if any?
Data Archiving Caveats● Do not share confidential data (unless it has
been de-identified and approved through IRB).● Consult with your collaborators before
publishing data.● It may be possible to embargo data so that it is
not available until the related publication is released.
Resources● DataOne Primer on Data Management,
https://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf● Dataverse, Data Management Plans,
http://best-practices.dataverse.org/data-management/● ICPSR Guide to Social Science Data Preparation and Archiving,
http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/● Svend Juul et al, “Take good care of your data,”
http://www.epidata.dk/downloads/takecare.pdf● UK Data Archive, Managing and Sharing Data: Best Practices for Researchers,
http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
Thanks!Please contact [email protected] or with any questions.Visit us online at http://researchdata.rice.edu/.Help us shape future workshops! Please complete this evaluation form: https://goo.gl/forms/4kOO9G7Hqrdi79hU2