View
1
Download
0
Category
Preview:
Citation preview
FileandDataManagement
JingSu20180223
WhatYouWillLearn
• Whyfilemanagementofyourresearchdataisimportant
• Specifictechniquesfororganizingyourresearchdata,–Filestructures–Filenaming–Versioncontrol-Storage&Backup
• Including:–Smallgroupdiscussion–Exercisefororganizingyourowndata
• Focusesonresearchdata,alsoappliestoothertypesoffiles
SmallGroupDiscussion
•Whatkindofdatadoyouworkwith?•WhatorganizaSonalchallengeshaveyoufaced?•Whattoolsortechniquesworkforyou?
ResearchDataLifecycle
Lifecycle:hVps://www.youtube.com/watch?v=-wjFMMQD3UA&feature=youtu.be
DataManagementChecklist1/2
DataManagementChecklist2/2
DataManagementChecklist
• Whattypesofdataandforhowlong?Fivestepstodecidewhatdatatokeep
•Whowillberesponsibletocollectanddocumentthedata?
RolesandresponsibiliSes.LegalandethicalobligaSonsandright.Planandconsenttoshare.
•Howtodocumentdifferenttypesofdata?Study-level,Data-level,andMetadataWetlab:ElectronicLabNotebook(ELN)ComputaSonal:largesizesequencingdata,consorSumdata(TCGA,ICGC)
DataManagementChecklist
Formats:Datatypeandsources
FileformatscurrentlyrecommendedbyUKDataArchiveforlongtermpreservaSonforrsearchdata
FileNamingConvenSons
• Makefilenamesunique• IncludemostimportantidenSfyinginformaSonoftheproject:
ü projectnameü acronym,orresearchdatanameü studyStleü locaSoninformaSonü researcheriniSalsü date(consistentlyformaVed,e.g.YYYYMMDD)ü version
• Useunderscorestoseparateelements;avoidspecialcharacters,spacesandperiods.
• UseleadingzeroswhenincorporaSngnumberstoenablesorSng(asequenceof1-100shouldbenumbered001-100).
• Filenamesshouldbeshortenoughtobereadable,whilesSllconveyingenoughperSnentinformaSon(limits255chars)
FileNamingConvenSonsExamples • TheGood:DryValleySoil_ICPOES_20101115_JDSv2.dat
– DryValleySoil,projectname– ICPOES,instrumentname– 20101115dateofsamplecreated– JDS,iniSalsofthescienSst– V2,secondversion
• TheBad:myData@DryValleyNovember152010.v2.dat• TheUgly:
Canyouunderstand/usethesedatafiles?Wouldanyone5yearsfromnow?•SrvMthdDraj.doc•SrvMthdFinal.doc•SrvMthdLastOne.doc•SrvMthdRealVersion.docUsecontent-ordescripSveinformaSon
BatchingRenamingTools • Windows:
•AdobeBridge(viaanyCreaSveCloudproducts):hVp://ist.mit.edu/adobe-creaSve-cloud•AntRenamer:hVp://www.antp.be/sojware/renamer•BulkRenameUSlity:hVp://www.bulkrenameuSlity.co.uk/•ImageMagick:hVp://www.imagemagick.org/•PSRenamer:hVp://www.powersurgepub.com/products/psrenamer.html•RenameIT:hVp://sourceforge.net/prpjects/renameit
• Mac:•AdobeBridge(viaanyCreaSveCloudproducts):hVp://ist.mit.edu/adobe-creaSve-cloud•ImageMagick:hVp://www.imagemagick.org/•NameChanger:hVp://web.mac.com/mickeyroberson/MRR_Sojware/NameChanger.html•PSRenamer:hVp://www.powersurgepub.com/products/psrenamer.html•Renamer4Mac:hVp://renamer4mac.com/•NameMangler:hVp://manytricks.com/namemangler/
• Linux:•GNOMECommander:hVp://www.nongnu.org/gcmd/•GPRename:hVp://gprename.sourceforge.net/•ImageMagick:hVp://www.imagemagick.org/•PSRenamer:hVp://www.powersurgepub.com/products/psrenamer.html
• Unix•Theuseofthegrepcommandtosearchforregularexpressions
VersionControl Aim:Keeprawdatauntouchedandreversetoearlierversion
• Saveanuntouchedcopyoftherawdata,workonsaveuntouchedcopy• UseafilenamingconvenSon(likev001,v002orv1_0,v1_2,v2_0• UseadirectorystructurenamingconvenSonthatincludesversion
informaSon• Datecanbepartofthefilename,e.g.
2012-02-27_Template_soil_tesNng.xlsx• Appendtheauthor’snametothefilename,e.g.
Template_soil_tesNng_modified_by_AH.xlsx• Addaversionnumberajerreachmajoredit,e.g.
Template_soil_tesNng_v03.xlsx• Directorytop-levelfoldersshouldincludetheprojectStle,unique
idenSfier,anddate(year),butthefilesthemselvesshouldbewell-describedindependentofthedirectorystructure.
• Versioncontroltools:– Wetlab:ElectronicLabNotebooks/Box/LIMS– Drylab:SVN/GitHub
VersionControlExample
FolderStructure • Methodsoforganisingelectronicmaterial
– Hierarchical:Itemsorganisedinfoldersandsub-folders– Tag-based:Eachitemassignedoneormoretags– HybridcombinaSonofhierarchicalandtag-based
FolderStructureExamples–Hierarchical
FolderStructureExamples–Tag-based
SmallGroupDiscussion
• Whatsortofstructure(s)doyoucurrentlyuse?• Whatdoyouseeasthekeyadvantagesanddisadvantagesof
thedifferenttypesofsystem?• AretherespecifictasksonesortofsystemseemsparScularly
suitablefor?Howdoesthisapplytoyourresearchproject?
DataStorage The everlasting external disks
Are they really permanent? What if…
Whatifyourdataislost
Cancer Research UK – University of Manchester – 27 April 2017
Whatifyourdataislost
l Yourlaptopgotstolenl Youroffice/houseburntl YourUSBsSckislostl Yourportableharddiskisdamagedl DatacopiedtoDropboxdisappeared
https://en.wikipedia.org/wiki/The_Scream
Storage+Security+EncrypSon+Backup+Sharing
https://en.wikipedia.org/wiki/The_Scream
Storage+Security+EncrypSon+Backup+Sharing
https://en.wikipedia.org/wiki/The_Scream
Storage+Security+EncrypSon+Backup+Sharing
https://en.wikipedia.org/wiki/The_Scream
• UniversityStorageService
• CRUKCIIT• Lab• Individual(Timemachine)• CLOUD?
Atleast2backupsat2differentloca3ons
External disks Online backup Department
College IT
Cheap £10-15 / TB (1024GB)
Failure rate 1.5%/year
Servers
Accessibility Free (limit)
Personal data Hacking
Moving between institutions
Managed by experts
DataBackup
Manual Automated
Copying files to relevant folders
- Install software e.g. Time machine (Mac users)
Automatically upload files to the cloud when any changes are saved
Copying files to relevant folders
- RAID technology - Checksums
DataBackup
Data backup and file sharing
Space/price 15 GB (free) 1 TB (~£80/year)
1 TB (free)
File history and recovery
Support
File size limit
Yes, unlimited
OS
2 GB (free) Unlimited (£55/year)
Windows, Mac, Linux, Android, iOS
UIS Unsupported UIS
Accessibility Sync anywhere on any devices
None
Yes
5 GB
Windows, Mac, Android, iOS
Live editing
Last 90 days
Integration with Microsoft Office
15 GB
Windows, Mac, Android, iOS
• Q:Ifmanual...howojen?A:Howmuchwouldyoubewillingtolose?
• So5wareallowsyoutosetupbackup)me
automa3cally
1 day 1 week 1 month-year
DataBackup
More … file sharing
Email Website FTP
Recommended