6
Activity Report - June 2018 Data submission services of EMBL Australia Bioinformatics Resource (EMBL-ABR) Overview Since January 2016, the QFAB@QCIF team provides data submission services on behalf of EMBL-ABR. These services refer to the guidance and support provided to help BioPlatform Australia (BPA) and Australian researchers with the process of curating, formatting and managing research data for transfer to existing international data repositories, where it will be publicly accessible for reuse.

Data submission services of EMBL Australia Bioinformatics Resource (EMBL-ABR) · 2019-04-30 · Activity Report - June 2018 Data submission services of EMBL Australia Bioinformatics

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data submission services of EMBL Australia Bioinformatics Resource (EMBL-ABR) · 2019-04-30 · Activity Report - June 2018 Data submission services of EMBL Australia Bioinformatics

ActivityReport-June2018

DatasubmissionservicesofEMBLAustraliaBioinformaticsResource(EMBL-ABR)

OverviewSince January 2016, theQFAB@QCIF teamprovidesdata submission servicesonbehalf ofEMBL-ABR.Theseservices refer to theguidanceandsupportprovidedtohelpBioPlatformAustralia (BPA) and Australian researchers with the process of curating, formatting andmanagingresearchdatafortransfertoexistinginternationaldatarepositories,whereitwillbepubliclyaccessibleforreuse.

Page 2: Data submission services of EMBL Australia Bioinformatics Resource (EMBL-ABR) · 2019-04-30 · Activity Report - June 2018 Data submission services of EMBL Australia Bioinformatics

ActivityReport-June2018

page2of6

EMBL-ABR’sDataSubmissionServiceTheQFABteamfromtheEMBL-ABR:QCIFNodeusesarangeofscriptsandstandardoperatingprocedurestosupportthesubmissionofAustraliansequence-baseddatatotheEBIEuropeanNucleotideArchive(ENA)andtheNCBISequenceReadArchive(SRA).Theserviceincludes:

• managementofENAandSRAdatasubmissionaccountsaccessiblebyresearchers• automationofuploadprocessessavingresearcher’stime• optimisationofdatatransferprocessestoensuredataintegrityandreducetransfer

failure• provision of staging infrastructure to facilitate submissions from the researcher’s

perspective• verificationandcollationofrequiredmetadatapriortosubmission• submissionofresearcherdatatoENAandSRA• submissionofselectedBioplatformsAustraliadata• maintaining boutique data submission tools such as Tox|Note (for venom-gland

transcriptomedata submission and toxin card creationonArachnoserver) and thesystemdevelopedfortheBeatsonGroupandcollaborators

• ahelpdeskforsupport.

Testimonials–2018EMBL-ABRsequencesubmissionservicewasanimmensehelpforthesubmissionofdataforourmanuscript.Giventhenumberofsamplesthatweneededtosubmit,havingsomehelpongetting itup toEBI savedusvaluable time.Given thesharedaccess toNeCTARcomputingresourceswewereabletotransferdatabetweeninstituteseasilyaswell.BothmyselfandothermembersoftheUniversityofAdelaideBioinformaticsHubhavealreadyrecommended students and PIs to make use of the service when submitting data forpublication.JimmyBreenRobinsonResearchInstitute,CoreFacilityLeader(Bioinformatics)atUniversityofAdelaide

DearDominiqueandteam,ThankyouSOmuchforyourworkonthistodate–ithasbeenbrilliant.RebeccaJohnsonAustralianMuseumResearchInstitute(AMRI)

IhadaverygoodexperiencewithusingtheEMBL-ABRsequencesubmissionservice.NickandGarethmadeitveryeasytocollateandsubmitourdatasetstotheENAdatabaseaheadofourpublicationinGenetics.Iwouldhighlyrecommendthemtoanyresearcherthatdealswitharchivingandsubmittinglargedatasets.DavidSchlipaliusSchoolofBiologicalSciences,TheUniversityofQueensland

Page 3: Data submission services of EMBL Australia Bioinformatics Resource (EMBL-ABR) · 2019-04-30 · Activity Report - June 2018 Data submission services of EMBL Australia Bioinformatics

ActivityReport-June2018

page3of6

RecentRepresentativePublicationsGlobalDNAMethylationPatternsCanPlayaRoleinDefiningTerroirinGrapevine(Vitisviniferacv.Shiraz)HXie,MKonate,NSai,KGTesfamicael,TCavagnaro…-Frontiersinplantscience,2017Variant linkage analysis using de novo transcriptome sequencing identifies a conservedphosphineresistancegeneininsects.Schlipalius, David I., Tuck, Andrew G., Jagadeesan, Rajeswaran, Nguyen, Tam, Kaur,Ramandeep, Subramanian, Sabtharishi, Barrero,Roberto,Nayak,Manoj andEbert, PaulR.(2018).Genetics209(1)281-290.ArachnoServer3.0:anonlineresourceforautomateddiscovery,analysisandannotationofspidertoxins.Pineda SS, Chaumeil PA, KunertA, KaasQ, ThangMWC, Le L,NuhnM,HerzigV, SaezNJ,Cristofori-ArmstrongB,AnangiR,SenffS,GorseD,KingGF.Bioinformatics.2018Mar15;34(6):1074-1076

Submissionstatistics

Page 4: Data submission services of EMBL Australia Bioinformatics Resource (EMBL-ABR) · 2019-04-30 · Activity Report - June 2018 Data submission services of EMBL Australia Bioinformatics

ActivityReport-June2018

page4of6

QFAB@QCIFteamVariousmembers of theQFAB team are involved in the provision of the data submissionservices:NickRhodes

• Contactpersonforallusersandstakeholders,includingENA• Processdesignandimprovement• Managementofuseraccounts• QualityControlofmetadatapriorofsubmission• Submissionofdata• Technicalsupport

MikeThangandThomCuddihy

• ManipulationofBAMfilesusingSAMtools• Processimplementation• Submissionofdata

JeffChristiansen

• Broadeningoftherangeofsupportedsubmissions• Identificationofmetadatarequirements• Investigationofmetadatamanagementsystems

Development and improvement of processes for datasubmissionTheQFAB@QCIFhasimprovetheefficiencyandeaseofuseofthedatasubmissionprocessby

• Automatingsomemanualsteps• DeployingandmaintainingadedicatedVMonQRIScloudfordatasubmission

o Linuxaccountsforuserstouploaddatao User“handholding”,asrequiredo Volumestorageallocatedasrequiredo NFSaccesstotheBPAcollectionsonQRIScloudRDSstorageo AsperaSecureCopyclient

Page 5: Data submission services of EMBL Australia Bioinformatics Resource (EMBL-ABR) · 2019-04-30 · Activity Report - June 2018 Data submission services of EMBL Australia Bioinformatics

ActivityReport-June2018

page5of6

SupportingBPAwiththesubmissionofdatatoENATheQFAB@QCIFissupportingsubmissionsofdataforthefollowingBPAprojects:

• BASEproject• GreatBarrierReefproject• MarineMicrobesproject(notyetstarted)

SupportingAustralianresearcherswiththesubmissionofdatatoENAandSRATheQFAB@QCIFteamissupportingdatasubmissionactivitiesfortheAustraliancommunity:

• Bacteriagenomes-ScottBeatson,UQ• Spiderandothertoxins–GlennKing,UQ• TasmanianDevil–BelindaWright,UniversityofSydney• Sponge–DegnanLab,UQ• Porphyromonasgingivalis-HelenMitchell,UoM• Streptococcuspneumoniae-BioinformaticsHub,UniversityofAdelaide• IndianMynagenomeassemblyassessment(AustralianMuseumResearchInstitute,

AMRI)• MSGBSsamplesfromBarossagrapes-BioinformaticsHub,UniversityofAdelaide• MSGBS samples, salt-induced alterations of DNAmethylation in barley – Stephen

Pederson,BioinformaticsHub,UniversityofAdelaide

Wehaverecentlyadoptedamoreproactiveapproachtopromotingtheserviceincludinghigh-profilelinksontheEMBL-ABRwebpage.WeanticipatethatmoreresearchersacrossAustraliawillbeinterestedinthedatasubmissionservicesasvisibilityincreases.

Maintainingdevelopedboutiquedatasubmissiontools

Tox|Note

In2014/2015Tox|Note,atoxinanalysisworkflow,wasdevelopedincollaborationwithGlennKing’sGroup(UQ),EMBL-ABR(formerlyBRAEMBL)andQFABBioinformaticstosignificantlyfast track theanalysisof venom-gland transcriptomesgeneratedby largeNextGeneration(NG)sequencingprojectsandallowaneasyandsimplesubmissionofthefindingsviaEMBL-ABRasdatabrokertoENA/UniProt.

Forthispurpose,EMBL-ABRandQFABBioinformaticsworkedcloselytogethertointegrateadatasubmissionmoduleintoTox|Noteallowingresearcherstosubmittheirsequenceswiththe requiredmetadata,obtainaccessionnumbersandautomatically create toxin cardsonArachnoServer,aglobalandpublicrepositoryforspidertoxinandstructureresearchavailableathttp://www.arachnoserver.org.

Page 6: Data submission services of EMBL Australia Bioinformatics Resource (EMBL-ABR) · 2019-04-30 · Activity Report - June 2018 Data submission services of EMBL Australia Bioinformatics

ActivityReport-June2018

page6of6

SRAuploadworkflow

AnSRAuploadworkflowwascreatedforsubmissionsofbacterialgenomestotheSRAfromtheBeatsonGroup.Thetoolintegratesauthentication,proxyhandling,messagingprotocols(Slack)andrecursivefilehandling,builtontheLinux-standardvsftpd(VerySecureFileTransferProtocolDaemon).

TheQFABhascontinuedtosupporttheseboutiquedatasubmissionservices:

• MaintenanceoftheTox|Noteworkflow

• SubmissionofnewlyidentifiedtoxinsbyTox|NotetoENAandUniProt

• Submission of bacterial genomes from the Beatson Group and maintenance ofbespokeuploadtooldeployedforthispurpose

SubmissionrequesttrackingspreadsheetData submission requests are tracked and shared with EMBL-ABR Hub through a Googlespreadsheetavailableat:https://docs.google.com/spreadsheets/d/1WtGL7IQf-a09kEVH79yqvC09HnTGT4KuC74_hZEiF3w/edit?usp=sharingEach submission request isdifferentwith some requestsbeing forone sampleonlywhilstothercouldbeforhundredsoreventhousandsofsamples.Assuch,theamountofsupportrequired for each request in the tracking spreadsheet varies vastly. We believe that ourinteractive,personalapproachtoclientrequirementsisfundamentaltoitsappealtousers.Experiencedwet-labscientistsmaylackthetimeorskillstonegotiatethesubmissionprocess,indeed thedelays observedbetween thedates of sequencing runs andwhenwe are firstapproachedindicatesthereisanaccumulatedback-logofsubmissions.

NickRhodes&DominiqueGorse

QFABBioinformatics,QCIF

BIOINFORMATICS|BIOSTATISTICS|BIODATA