Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead...

  • View
    107

  • Download
    3

  • Category

    Science

Preview:

DESCRIPTION

Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience, Tsuruoka 23rd June 2014

Citation preview

Beyond Dead Trees: data & workflow publishing with

Scott EdmundsRob Davidson

The problems with publishing

• Scholarly articles are merely advertisement of scholarship . The actual scholarly artefacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible --- Jon B. Buckheit and David L. Donoho, WaveLab and reproducible research, 1995

• Lack of transparency, lack of credit for anything other than “regular” dead tree publication.

• Traditional publishing models, policies and practices holding things back

Why is this important?

…to publish protocols BEFORE analysis…better access to supporting data/code…more transparent & accountable review

…to publish replication studies

Need:

Consequences: increasing number of retractions>15X increase in last decade

1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 142. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950

At current % > by 2045 as many papers published as retracted

• Data• Software• Review• Re-use…

= Credit

}

Credit where credit is overdue:“One option would be to provide researchers who release data to public repositories with a means of accreditation.”“An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. “Nature Biotechnology 27, 579 (2009)

New incentives/credit

GigaSolution: deconstructing the paper

www.gigadb.orgwww.gigasciencejournal.com

Utilizes big-data infrastructure and expertise from:

Combines and integrates:Open-access journal

Data Publishing Platform

Data Analysis Platform

On top of regular papers…

Rewarding open data

http://gigadb.org/

• Multi Omics focus (not just genomics)• 10-100x faster download than FTP• Provide (ISA) curation & integration with other DBs

(e.g. MetaboLights, SRA, etc.)

For more see: http://database.oxfordjournals.org/content/2014/bau018.abstract

IRRI GALAXY

Democratization through data publishing

IRRI GALAXYRice 3K project: 3,000 rice genomes, 13.4TB public data

Democratization through data publishing

Two tools for reproducible research

Rob Davidson

RO:and

GigaSolution: deconstructing the paper

www.gigadb.orgwww.gigasciencejournal.com

Utilizes big-data infrastructure and expertise from:

Combines and integrates:Open-access journal

Data Publishing Platform

Data Analysis Platform

Visualizations & DOIs for workflows

galaxy.cbiit.cuhk.edu.hk

Implement workflows in a community-accepted format

http://galaxyproject.org

Over 36,000 main Galaxy server users

Over 1,000 papersciting Galaxy use

Over 55 Galaxyservers deployed

Open source

Rewarding and aiding reproducibility

Copyright NBAF-B 2013Tool list

Tool parameterisation Results panel

Rewarding and aiding reproducibilityImplement workflows in a community-accepted format

Birmingham Metabo-Galaxy Workflow

Birmingham Metabo-Galaxy

Tools wrapped in Python and XMLUser sees web form (easy!)Data stored centrally (secure!)Work done centrally (easy update)

First RAW -> stats Galaxy Pipe

SOAPdenovo2 S. aureus pipeline

NO

Handling of imaging (phenotype) dataCyber-centipedes & virtual worms

Aiding reproducibility

OMERO: providing access to imaging data

View, filter, measure raw images with direct links from journal article.

See all image data, not just cherry picked examples.

Download and reprocess.

JCB: Aiding reproducibility, adding value

The alternative...

...look but don't touch

In Summary

• Reproducibility is important!!– Currently not very common!

• Many tools appearing for data publishing and sharing (images, tools, workflows).

• Data publishing → more publications, more citations, more impact!

• Are you convinced? • What barriers? Code standards? Data

standards? Too much work?

Give us data, papers & pipelines*

Help us make it happen!

scott@gigasciencejournal.comrob@gigasciencejournal.com editorial@gigasciencejournal.com database@gigasciencejournal.com

Contact us:

* APC’s currently generously covered by BGI until 2015

www.gigasciencejournal.com

Ruibang Luo (BGI/HKU)Shaoguang Liang (BGI-SZ)Tin-Lap Lee (CUHK)Qiong Luo (HKUST)Senghong Wang (HKUST)Yan Zhou (HKUST)

Thanks to:

@gigasciencefacebook.com/GigaScienceblogs.biomedcentral.com/gigablog/

Peter LiHuayan Gao Chris HunterJesse Si ZheNicole NogoyLaurie GoodmanAmye Kenall (BMC)

Marco Roos (LUMC)Mark Thompson (LUMC)Jun Zhao (Lancaster)Susanna Sansone (Oxford)Philippe Rocca-Serra (Oxford) Alejandra Gonzalez-Beltran (Oxford)

www.gigadb.orggalaxy.cbiit.cuhk.edu.hk

www.gigasciencejournal.com

CBIITFunding from:

Our collaborators:team:

Recommended