FAIR bioinfo for bioinformaticians

Preview:

Citation preview

FAIR bioinfo for bioinformaticiansIntroduction to the tools of reproducibility in bioinformatics

C. Hernandez1 T. Denecker1 J.Sellier2 C. Toffano-Nioche1

1Institute for Integrative Biology of the Cell (I2BC)UMR 9198, Universite Paris-Sud, CNRS, CEA

91190 - Gif-sur-Yvette, France

2Institut de Genetique et de Biologie Moleculaire et Cellulaire (IGBMC)CNRS UMR 7104 - Inserm U 1258

67404 - Illkirch cedex, France

Sept. 2020

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 1 / 28

Introduction

A (not-so-uncommon) nightmare

Runanalysis

Submittoajournal

Requestfromareviewer

Re-installsoftware

Re-runanalysis

Resultsaredifferent?!

What changed?

Software version

Libraries version

OS version

..?

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 2 / 28

Introduction

A (not-so-uncommon) nightmare

Runanalysis

Submittoajournal

Requestfromareviewer

Re-installsoftware

Re-runanalysis

Resultsaredifferent?!

What changed?

Software version

Libraries version

OS version

..?

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 2 / 28

Different levels of encapsulation

Goal : capture the system environment of applications (OS, packages,libraries,. . . ) to control their execution.

Hardware virtualisation (virtual machines)

OS virtualisation (images and containers)

Environment management

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 3 / 28

Encapsulation

Let’s say we want to install Firefox...

Windows MacOS Unix-based

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 4 / 28

Encapsulation

Computer

Host OS

We started with a computer using aspecific OS...

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 5 / 28

Encapsulation

Computer

Host OS

Application

We started with a computer using aspecific OS...And inside this environment, weinstalled a new application.

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 6 / 28

Encapsulation

Computer

Host OS

Libraries

Application

We started with a computer using aspecific OS...And inside this environment, weinstalled a new application.Applications rely on dependencies,e.g. external libraries.

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 7 / 28

Encapsulation

Computer

Host OS

Libraries

Application v1

Libraries

Application v1.2

Usually dependencies of differentapplications don’t interfere.But what if we want to test thelatest version of our favourite tool?There might be conflicts. . .

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 8 / 28

Encapsulation : hardware virtualisation

Computer

Host OS

VM manager

Guest OS 1

Libraries

Application v1

Guest OS 1

Libraries

Application v2 Idea: use virtual machinesPros:

Each application gets acompletely different andindependent environment

Virtual machines can betransferred to another computer(using the same manager)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 9 / 28

Encapsulation : hardware virtualisation

MacOS

Ubuntu

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 10 / 28

Encapsulation : hardware virtualisation

Computer

Host OS

VM manager

Guest OS 1

Libraries

Application v1

Guest OS 1

Libraries

Application v2 Idea: use virtual machinesPros: transferable independentenvironmentsCons:

Redundancy between VMs

Heavy to set up

No automation

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 11 / 28

Encapsulation : OS virtualisation

Computer

Host OS

???

Guest OS 1

Libraries

Application v1

Libraries

Application v2

Libraries Idea: ”trick” applications intobelieving that they are in a differentOS than the host’sAvoid redundancy.

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 12 / 28

Encapsulation : OS virtualisation

Computer

Host OS Container engine

Minimal guest OS

Libraries

Application v1

Libraries

Application v2

Libraries Idea: ”trick” applications intobelieving that they are in a differentOS than the host’sAvoid redundancy.

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 13 / 28

Encapsulation : OS virtualisation

Practical Computational Reproducibility in the Life Sciences - BjornGruning et al (2018)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 14 / 28

What is Docker?

Docker is not very “old”

First commit January 2013

First version March 2013

Version 1.0 in June 2014

But its adoption was fast

Officially packaged in Ubuntu since 2014 (v14.04)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 15 / 28

What is Docker?

Image

Set of libraries and functions

Fixed. Cannot be modified

Can be stored/shared online

Can be automatically built

Container

”Active image”

Can be modified (interactive)

Can be turned into an image

One image, many containers

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 16 / 28

What is Docker?

(https://docs.docker.com/get-started/overview/)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 17 / 28

What is Docker?

DockerHub

(https://hub.docker.com/)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 18 / 28

What is Docker?

Usermade images (1/2)

(urlhttps://hub.docker.com/u/genomicpariscentre/)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 19 / 28

What is Docker?

Usermade images (2/2)Be critical!

(https://hub.docker.com/r/genomicpariscentre/samtools/)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 20 / 28

What is Docker?

(https://docs.docker.com/get-started/overview/)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 21 / 28

What is Docker?

Other commands :

docker images : list images available locally

docker ps : status of containers

docker rm : delete a container

docker rmi : delete an image

...

(More details during the practical session.)

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 22 / 28

Encapsulation : OS virtualisation

Computer

Host OS Container engine

Minimal guest OS

Libraries

Application v1

Libraries

Application v2

Libraries

OS virtualisation vs hardwarevirtualisationPros:

SpeedI Installation is fasterI No boot time

LightweightI Minimal base OSI Minimal libraries and

application set

Easy sharing of applications

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 23 / 28

Encapsulation : OS virtualisation

Computer

Host OS Container engine

Minimal guest OS

Libraries

Application v1

Libraries

Application v2

Libraries Cons:

Needs root access (Singularity)

Changes of policies of theDocker company

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 24 / 28

Docker policy

Update of the Docker Image retention policy (13/08/2020)

https://www.docker.com/pricing/retentionfaq

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 25 / 28

Practical session

Practical session : Docker and Samtools.See companion document.

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 26 / 28

Practical session

Analysis workflow

green=input, blue=tool

fastqc control quality of the input reads

bowtie2 reads mapping on the genome sequence

samtools mapped reads selection & formatting

HTseq count table of mapped reads on genes (annotations)

DEseq2 statistical analysis: genes list having differential expression

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 27 / 28

Practical session

Savoir FAIRe

(Installation de Docker)

Learn the structure of a Docker command

Pull a pre-defined image available on the DockerHub

Start a container

Bonus: build a Dockerfile

Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 28 / 28