Upload
peter-mcquilton
View
376
Download
1
Embed Size (px)
Citation preview
BioSharing.org – mappingthelandscapeofstandardsinthe
lifesciences
PeterMcQuilton,PhD(@drosophilic)BioSharing contentlead
OnbehalfoftheBioSharing team(@biosharing)
Outline
• Standards,databasesandpoliciesinthelifesciences
• BioSharing – aninformativeandeducational
resource
• Whatitis
• Howtouseit
• Howitcanhelpyou
Agrowthindata,agrowthindatabases
Number ofdatabasesintheNARdatabaseissue, upto2015(from@AlexBateman1)
Credit:ttps://projects.ac/blog/five-top-reasons-to-protect-your-da ta-and-practise-safe-science/ 2014
Betterdata=betterscience
A B C D E1 Group1 Group22 Day 03 Sodium 139 1424 Potassium 3.3 4.85 Chloride 100 1086 BUN 18 187 Creatine 1.2 1.28 Uric acid 5.5* 6.2*9 Day 710 Sodium 140 14611 Potassium 3.4 5.112 Chloride 97 108
S1Sh.cuo
Creditto:IainHrynaszkiewicz
Sharingstartswithgoodmetadata…
A B C D E1 Group1 Group22 Day 03 Sodium 139 1424 Potassium 3.3 4.85 Chloride 100 1086 BUN 18 187 Creatine 1.2 1.28 Uric acid 5.5* 6.2*9 Day 710 Sodium 140 14611 Potassium 3.4 5.112 Chloride 97 108
S1Sh.cuo Meaninglesscolumntitles
Specialcharacterscancausetextmining
errors
Nounits
Unhelpfuldocumentname
Undefinedabbreviation
Formattingforinformationthatshouldbeinmetadata
Creditto:IainHrynaszkiewicz
….whichthisisn’t...
A B C D E F1 Parameter Day Control Treated Units P2 Sodium 0 139 142 mEq/l 0.823 Sodium 7 140 146 mEq/l 0.704 Sodium 14 140 158 mEq/l 0.035 Sodium 21 143 160 mEq/l 0.026 Potassium 0 3.3 4.8 mEq/l 0.067 Potassium 7 3.4 5.1 mEq/l 0.078 Potassium 14 3.7 4.7 mEq/l 0.109 Potassium 21 3.1 3.6 mEq/l 0.5210 Chloride 0 100 108 mEq/l 0.5611 Chloride 7 97 108 mEq/l 0.6812 Chloride 14 101 106 mEq/l 0.79
Table_S1_Shanghai_blood.xls
Creditto:IainHrynaszkiewicz
….Thisismuchclearer!
Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared…
Fromnaturallanguagetostructureddata
Age valueUnitStrain nameSubject of the experimentType of diet and experimental conditionAnatomy part
Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared …
Type of protocol – cell preparation
Type of protocol - sample treatment
Type of protocol – liver preparation
Fromnaturallanguagetostructureddata
• Data/content standards:
• Structure, enrich and report the description of the
datasets and the experimental context under which they
were produced
• Facilitate the discovery, sharing, understanding and
reuse of datasets
Datahastobestructuredforsharing– weneedstandards
de jure de facto
grass-rootsgroups
standard organizations
Nanotechnology Working Group
Communitymobilisationtodevelopcontentstandards
Formats Terminologies Guidelines
Guidelines=Minimuminformation
reportingrequirements,checklists
o Reportthesamecore,essential
information
o e.g.ARRIVEguidelines
Terminologies=Controlled
vocabularies,taxonomies,
thesauri,ontologies etc.
o Usethesamewordand
refertothesame‘thing’
o e.g.GeneOntology
Models/Formats=Conceptual
model,conceptualschema,
exchangeformats
o Allowdatatoflowfromone
systemtoanother
o e.g.FASTA
Enablers:tobetterdescribe,shareandquerydata
Formats Terminologies Guidelines
19385
346
miameMIAPA
MIRIAMMIQASMIX
MIGEN
ARRIVEMIAPE
MIASE
MIQE
MISFISHIE….
REMARK
CONSORT
MAGE-TabGCDML
SRAxmlSOFT FASTA
DICOM
MzMLSBRML
SEDML…
GELML
ISA-Tab
CML
MITAB
AAOCHEBI
OBIPATO ENVOMOD
BTOIDO…
TEDDY
PROXAO
DO
VO
Thereareover600standardsinthelifesciences
Formats Terminologies Guidelines
Datapolicies(30)
Databases(763)
data/metadatastandards(652)
Acomplexandevolvinglandscape
Formats Terminologies Guidelines
Isthereadatabase,implementingstandards,wheretodepositmy
metagenomics dataset?
Myfunder’sdatasharingpolicyrecommendstheuseof
establishedstandards,butwhichonesarewidelyendorsedandapplicabletomytoxicological
andclinicaldata?
AmIusingthemostup-to-dateversion ofthisterminologytoannotatecell-basedassays?
Iunderstandthisformathasbeendeprecated;whathasbeenreplaced by
andhowisleadingthework?
Aretheredatabasesimplementingthisexchangeformat,whosedevelopment
wehavefunded?
Whatarethematurestandards and
standards-compliantdatabasesweshouldrecommendtoour
authors?
Helpingpeoplemaketherightdecision
Mapping the landscape of ‘standards’ in the life, environmental and biomedical sciences
Mapping the landscape of ‘standards’ in the life, environmental and biomedical sciences
1,400recordsandgrowing
WhatisBioSharing?
Aweb-based,curated andsearchableportalthat monitorsthedevelopmentandevolution ofstandards,theiruse indatabasesandtheadoptionofbothin
datapolicies, toinformandeducatetheusercommunity.
Mapping the landscape of ‘standards’ in the life, environmental and biomedical sciences
Mapping the landscape of ‘standards’ in the life, environmental and biomedical sciences
WhatisBioSharing?
Launchedin2011,asanevolutionoftheMIBBIportal(2008-2011)ManuallycuratedCommunitydriven
Growinguserbase andvisibility
1,400recordsandgrowing
Model/format formalizingreportingguideline -->
<-- Reportingguidelineusedbymodel/format
Cross-linkingstandardstostandardsanddatabases
Model/format formalizingreportingguideline -->
<-- Reportingguidelineusedbymodel/format
Cross-linkingstandardstostandardsanddatabases
Indicatorsoflifecyclestatus
Readyforuse,implementation,orrecommendation
Indevelopment
Statusuncertain
Deprecatedassubsumedorsuperseded
Manuallycurated,approvedbythecommunity
AninformativeandeducationalresourceSimpleandadvancedsearches,askourwizardor
viewjournalrecommendations
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
Search,filter,andrefineusingourfacetedsearchSearch,filter,andrefineusingourfacetedsearch
Collections grouptogether
oneormoretypesofresource
bydomain,projector
organization.
Recommendations areacore-
setofresourcesthatare
selectedandrecommended
byafunderorjournaldata
policy.
Standardsanddatabasesrecommended byjournaldatapolicies
Standardsanddatabasesrecommended byjournaldatapolicies
Thewizard:
• Guidesusersthroughthedata• Willgrowinfunctionalityand
complexity,basedonuserfeedback
• Poweredbycurateddescriptionsofeachstandardanddatabase,andtheirrelations
Isthereadatabase,implementingstandards,wheretodepositmy
metagenomics dataset?
Myfunder’sdatasharingpolicyrecommendstheuseof
establishedstandards,butwhichonesarewidelyendorsedandapplicabletomytoxicological
andclinicaldata?
AmIusingthemostup-to-dateversion ofthisterminologytoannotatecell-basedassays?
Iunderstandthisformathasbeendeprecated;whathasbeenreplaced by
andhowisleadingthework?
Aretheredatabasesimplementingthisexchangeformat,whosedevelopment
wehavefunded?
Whatarethematurestandards and
standards-compliantdatabasesweshouldrecommendtoour
authors?
Helpingpeoplemaketherightdecision
BioSharing – whatwedo
Inform – what’soutthere,whichdatabasesusewhichstandards.Mapthelandscape.
Educate– whatdatabasesarerecommendedbyyourfunder,orjournalofchoice,whichstandardsshouldyoubeusing,whichstandardsanddatabasesshouldyourecommend?Explorethelandscape.
Acknowledgements
EamonnMaguire,DPhilSoftwareEngineer(contractor)
PhilippeRocca-Serra,PhDSeniorResearchLecturer
AlejandraGonzalez-Beltran,PhDResearchLecturer
MiloThurston,DPhilResearchSWEngineer
MassimilianoIzzo,PhDResearchSWEngineer
PeterMcQuilton,PhDKnowledgeEngineer
AllysonLister,PhDKnowledgeEngineer
DavidJohnson,PhDResearchSWEngineer
Susanna-AssuntaSansone,PhDCentre’sAssociateDirector,PrincipalInvestigatorandSpringerNature’sConsultantforScientificData