29
FAIR bioinfo for bioinformaticians Introduction to the tools of reproducibility in bioinformatics C. Hernandez 1 T. Denecker 1 J.Sellier 2 C. Toffano-Nioche 1 1 Institute for Integrative Biology of the Cell (I2BC) UMR 9198, Universit´ e Paris-Sud, CNRS, CEA 91190 - Gif-sur-Yvette, France 2 Institut de G´ en´ etique et de Biologie Mol´ eculaire et Cellulaire (IGBMC) CNRS UMR 7104 - Inserm U 1258 67404 - Illkirch cedex, France Sept. 2020 eline, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 1 / 28

FAIR bioinfo for bioinformaticians

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FAIR bioinfo for bioinformaticians

FAIR bioinfo for bioinformaticiansIntroduction to the tools of reproducibility in bioinformatics

C. Hernandez1 T. Denecker1 J.Sellier2 C. Toffano-Nioche1

1Institute for Integrative Biology of the Cell (I2BC)UMR 9198, Universite Paris-Sud, CNRS, CEA

91190 - Gif-sur-Yvette, France

2Institut de Genetique et de Biologie Moleculaire et Cellulaire (IGBMC)CNRS UMR 7104 - Inserm U 1258

67404 - Illkirch cedex, France

Sept. 2020

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 1 / 28

Page 2: FAIR bioinfo for bioinformaticians

Introduction

A (not-so-uncommon) nightmare

Runanalysis

Submittoajournal

Requestfromareviewer

Re-installsoftware

Re-runanalysis

Resultsaredifferent?!

What changed?

Software version

Libraries version

OS version

..?

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 2 / 28

Page 3: FAIR bioinfo for bioinformaticians

Introduction

A (not-so-uncommon) nightmare

Runanalysis

Submittoajournal

Requestfromareviewer

Re-installsoftware

Re-runanalysis

Resultsaredifferent?!

What changed?

Software version

Libraries version

OS version

..?

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 2 / 28

Page 4: FAIR bioinfo for bioinformaticians

Different levels of encapsulation

Goal : capture the system environment of applications (OS, packages,libraries,. . . ) to control their execution.

Hardware virtualisation (virtual machines)

OS virtualisation (images and containers)

Environment management

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 3 / 28

Page 5: FAIR bioinfo for bioinformaticians

Encapsulation

Let’s say we want to install Firefox...

Windows MacOS Unix-based

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 4 / 28

Page 6: FAIR bioinfo for bioinformaticians

Encapsulation

Computer

Host OS

We started with a computer using aspecific OS...

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 5 / 28

Page 7: FAIR bioinfo for bioinformaticians

Encapsulation

Computer

Host OS

Application

We started with a computer using aspecific OS...And inside this environment, weinstalled a new application.

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 6 / 28

Page 8: FAIR bioinfo for bioinformaticians

Encapsulation

Computer

Host OS

Libraries

Application

We started with a computer using aspecific OS...And inside this environment, weinstalled a new application.Applications rely on dependencies,e.g. external libraries.

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 7 / 28

Page 9: FAIR bioinfo for bioinformaticians

Encapsulation

Computer

Host OS

Libraries

Application v1

Libraries

Application v1.2

Usually dependencies of differentapplications don’t interfere.But what if we want to test thelatest version of our favourite tool?There might be conflicts. . .

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 8 / 28

Page 10: FAIR bioinfo for bioinformaticians

Encapsulation : hardware virtualisation

Computer

Host OS

VM manager

Guest OS 1

Libraries

Application v1

Guest OS 1

Libraries

Application v2 Idea: use virtual machinesPros:

Each application gets acompletely different andindependent environment

Virtual machines can betransferred to another computer(using the same manager)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 9 / 28

Page 11: FAIR bioinfo for bioinformaticians

Encapsulation : hardware virtualisation

MacOS

Ubuntu

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 10 / 28

Page 12: FAIR bioinfo for bioinformaticians

Encapsulation : hardware virtualisation

Computer

Host OS

VM manager

Guest OS 1

Libraries

Application v1

Guest OS 1

Libraries

Application v2 Idea: use virtual machinesPros: transferable independentenvironmentsCons:

Redundancy between VMs

Heavy to set up

No automation

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 11 / 28

Page 13: FAIR bioinfo for bioinformaticians

Encapsulation : OS virtualisation

Computer

Host OS

???

Guest OS 1

Libraries

Application v1

Libraries

Application v2

Libraries Idea: ”trick” applications intobelieving that they are in a differentOS than the host’sAvoid redundancy.

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 12 / 28

Page 14: FAIR bioinfo for bioinformaticians

Encapsulation : OS virtualisation

Computer

Host OS Container engine

Minimal guest OS

Libraries

Application v1

Libraries

Application v2

Libraries Idea: ”trick” applications intobelieving that they are in a differentOS than the host’sAvoid redundancy.

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 13 / 28

Page 15: FAIR bioinfo for bioinformaticians

Encapsulation : OS virtualisation

Practical Computational Reproducibility in the Life Sciences - BjornGruning et al (2018)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 14 / 28

Page 16: FAIR bioinfo for bioinformaticians

What is Docker?

Docker is not very “old”

First commit January 2013

First version March 2013

Version 1.0 in June 2014

But its adoption was fast

Officially packaged in Ubuntu since 2014 (v14.04)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 15 / 28

Page 17: FAIR bioinfo for bioinformaticians

What is Docker?

Image

Set of libraries and functions

Fixed. Cannot be modified

Can be stored/shared online

Can be automatically built

Container

”Active image”

Can be modified (interactive)

Can be turned into an image

One image, many containers

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 16 / 28

Page 18: FAIR bioinfo for bioinformaticians

What is Docker?

(https://docs.docker.com/get-started/overview/)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 17 / 28

Page 19: FAIR bioinfo for bioinformaticians

What is Docker?

DockerHub

(https://hub.docker.com/)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 18 / 28

Page 20: FAIR bioinfo for bioinformaticians

What is Docker?

Usermade images (1/2)

(urlhttps://hub.docker.com/u/genomicpariscentre/)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 19 / 28

Page 21: FAIR bioinfo for bioinformaticians

What is Docker?

Usermade images (2/2)Be critical!

(https://hub.docker.com/r/genomicpariscentre/samtools/)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 20 / 28

Page 22: FAIR bioinfo for bioinformaticians

What is Docker?

(https://docs.docker.com/get-started/overview/)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 21 / 28

Page 23: FAIR bioinfo for bioinformaticians

What is Docker?

Other commands :

docker images : list images available locally

docker ps : status of containers

docker rm : delete a container

docker rmi : delete an image

...

(More details during the practical session.)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 22 / 28

Page 24: FAIR bioinfo for bioinformaticians

Encapsulation : OS virtualisation

Computer

Host OS Container engine

Minimal guest OS

Libraries

Application v1

Libraries

Application v2

Libraries

OS virtualisation vs hardwarevirtualisationPros:

SpeedI Installation is fasterI No boot time

LightweightI Minimal base OSI Minimal libraries and

application set

Easy sharing of applications

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 23 / 28

Page 25: FAIR bioinfo for bioinformaticians

Encapsulation : OS virtualisation

Computer

Host OS Container engine

Minimal guest OS

Libraries

Application v1

Libraries

Application v2

Libraries Cons:

Needs root access (Singularity)

Changes of policies of theDocker company

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 24 / 28

Page 26: FAIR bioinfo for bioinformaticians

Docker policy

Update of the Docker Image retention policy (13/08/2020)

https://www.docker.com/pricing/retentionfaq

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 25 / 28

Page 27: FAIR bioinfo for bioinformaticians

Practical session

Practical session : Docker and Samtools.See companion document.

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 26 / 28

Page 28: FAIR bioinfo for bioinformaticians

Practical session

Analysis workflow

green=input, blue=tool

fastqc control quality of the input reads

bowtie2 reads mapping on the genome sequence

samtools mapped reads selection & formatting

HTseq count table of mapped reads on genes (annotations)

DEseq2 statistical analysis: genes list having differential expression

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 27 / 28

Page 29: FAIR bioinfo for bioinformaticians

Practical session

Savoir FAIRe

(Installation de Docker)

Learn the structure of a Docker command

Pull a pre-defined image available on the DockerHub

Start a container

Bonus: build a Dockerfile

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 28 / 28