75
NEXT-GENERATION SEQUENCING AND BIOINFORMATICS

NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

  • Upload
    danganh

  • View
    221

  • Download
    3

Embed Size (px)

Citation preview

Page 1: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

NEXT-GENERATION SEQUENCING AND BIOINFORMATICS

Page 2: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Moore's law: the number of transistors in a dense integrated circuit doubles every two years

Page 3: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Moore's law calculates and predicts the pace of improvement of one of the fastest improving

technologies, computers

Page 4: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

In the last 15 years the pace of improvement of DNA sequencing technologies has been much faster than that

of computers

Page 5: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System
Page 6: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Frederick SangerNobel prize in chemistry in 1958 for sequencing insulin (and

proteins in general)

Nobel prize in chemistry in 1980 for sequencing nucleic acids

One of only three persons to win two Nobel prizes in science

Page 7: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

SANGER SEQUENCING

Page 8: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

SANGER SEQUENCINGThe most modern Sanger sequencers allow parallelization of up to 96 samples at once

Before sequencing a step of PCR and purification is necessary – and if you do not know the sequence in advance you need to perform a cloning step

OUTPUT: 1000 bases per run (96000 if you parallelize)

Page 9: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

NEXT-GENSEQUENCING TECHNOLOGIES

• Roche/454 FLX

• Applied Biosystems SOLiD System

• Illumina/Solexa sequencing by synthesis

• IonTorrent

Page 10: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

NEXT-GENERATION DNA SEQUENCINGMAIN CHARACTERISTICS

EXTREME MINIATURIZATION

Reactions are carried out in volumes of microliters thanks to specific technological advances

This in turn allows

MASSIVE PARALLELIZATION

Thousands, millions of reactions are performed in parallel, reducing the costs and increasing the output volume by orders

of magnitude

Page 11: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

454 pyrosequencing

Page 12: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

SAMPLE PREPARATION

Nebulization of genomic DNA in fragments of 400-1000 base pairs

Ligation of fragments to two adapters (type A and type B)

Selection of single strand fragments with both adapters

Page 13: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

EMULSION PCR

Fragments are mixed with agarose beads by 28 microns in diameter bearing complementary to oligo adapters

Isolation of each bead-fragment into individual micelles in water-oil

Emulsion PCR reaction in 1 million copies of amplified fragment on the surface of each bead

Page 14: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

SAMPLE LOAD

Each bead is placed in a well of a picotiter slide (7x7 cm fiber optic slide); several million 44 microns diameter wells per slide

Multiple enzymes and reagents are added in the form of even smaller beads

Page 15: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

PYROSEQUENCING REACTION

1 single nucleotide species is added each cycle

Nucleotide incorporation light generation→Rothberg Nat. Biotechnol. 2008

Page 16: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

ROCHE/454 FLX Pyrosequencer

1 EMULSION PCR takes the place of thousands of cloning experiments

1 SEQUENCING RUN takes the place of thousands of SANGER sequencing runs

EXTREME MINIATURIZATION

MASSIVE PARALLELIZATION

Page 17: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

ROCHE/454 GSFLX+

BASE CALLING ACCURACY: 99.9% or more (lower in the final part of the reads)

OUTPUT: Generates reads up to 1,000

nucleotides long

Generates about 500,000-1,000,000 reads

For a total output of 700 megabases per run (8 hours)

Page 18: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

454 MAIN ISSUEHomopolymers: stretches of one single nucleotide species

Intrinsic problem of the technology

Multiple identical nucleotides are incorporated in a single cycle

They generate more light, but discrimination becomes increasingly more difficult

Page 19: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

454 MAIN ISSUE

This problem can affect the downstream bioinformatic analysis

KNOW YOUR MACHINE!

Page 20: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

ILLUMINA/SOLEXA

Currently the market leaderVery low cost per base, proven technology

sequencing by synthesis

Page 21: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

ILLUMINA/SOLEXA

1. DNA fragmentation and ligation to 2 types of adapters

3. "bridge" amplification using primers complementary to the adapters that are bound to the substrate at high density production of →clusters of up to 1,000,000 of template copies "in situ" that generate a sufficient signal to be detected

2. Templates are bound on the surface of a flow microcell

Page 22: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

ILLUMINA/SOLEXA

4. Addition of fluorescent nucleotides blocked at 3'-OH 5. Fluorescence detection6. Removal of the fluorophore 7. repeat steps 3-5

Page 23: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

ILLUMINA/SOLEXA

• Four different fluorophores no issues with →homopolymers

• Shorter reads blocking the incorporation of multiple nucleotides is one of

the basis of the Illumina methodEach cycle imperfect blocking happens, a small percentage

of the copies in a cluster incorporates two nucleotides, giving noise instead of good signal

When this percentage reaches a threshold, the signal is lost

KNOW YOUR MACHINE

Page 24: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

ILLUMINA/SOLEXA• DIFFERENT INSTRUMENTS (Benchtop ones)

Page 25: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

ILLUMINA/SOLEXA• DIFFERENT INSTRUMENTS (high yield ones)

Page 26: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

The smallest sequencer, fast and economical

An instrument: $ 60,000A run: ~ $ 1,000 (high scalability)

Output: up to 10 Gb of reads long up to 600pb

Very quick, a run lasts for 3 hours

ION TORRENT

Page 27: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

In many respects similar to 454

DNA is amplified on microbeads and inserted into wells

Then subjected to cycles of incorporation of a single type of nucleotide

Support for basic analyses without bioinformatic knowledge

ION TORRENT

Page 28: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

ION TORRENT

The sequencing is performed on a semiconductor chip, which identifies the liberation of protons

Potential rapid technological development, taking advantage of the electronics industry

Does not detect light, but the release of H+ ions by sequencing - As a camera chip, which instead of detecting photons detects protons

Page 29: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

All nucleotides release H+, so cycles of incorporations of individual types of nucleotides are required (A, C, G, T)

ION TORRENT

Same issue as 454: homopolymers

Page 30: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

THIRD GENERATIONSEQUENCING TECHNOLOGIES

• Pacific Biosciences

• Oxford Nanopore

Page 31: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

THIRD GENERATIONSEQUENCING TECHNOLOGIES

REAL TIME SEQUENCING

The idea is to bypass the amplification step

Advantage

Page 32: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

THIRD GENERATIONSEQUENCING TECHNOLOGIES

REAL TIME SEQUENCING

The idea is to bypass the amplification step

This allows to avoid DNA fragmentation, and to obtain LONGER reads, FASTER

Advantage

Page 33: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Pacific Biosciences PACBIOLaunched in 2009 (third-generation?)

Real-Time sequencing technology

The idea is to directly observe the DNA polymerization while it is performed by DNA polymerase

Single Molecule Real Time (SMRT) sequencing

Recently the third machine was released: PACBIO SEQUEL

cost around 350,000 dollars

Page 34: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Zero-mode waveguide (ZMW)

Highly sensitive detection system

Nanophotonic structure with 50nm diameter cells

Same principle of microwave ovens doors

A laser illuminates from below, but the wavelength is too large to allow the diffusion of light

Page 35: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Zero-mode waveguide (ZMW)

The light penetrates 20-30 nm

This allows to identify only what happens on the bottom of the well, reducing background noise and getting high sensitivity and temporal resolution

The latest PacBio instrument has around 1,000,000 wells

Page 36: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Polimerase phi-29phage polymerase

Highly processive, up to 70,000 nt

High fidelity, up to 100 times more of Taq polymerase

Modifed to be slower

The polymerase is linked to the bottom of the wells

Only 1/3 of the wells get a single polymerase, and thus can perform the sequencing

Page 37: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

PacBio sequencing

A single strand DNA is bound to the polymerase

Addition of the 4 nucleotide species, tagged with 4 different fluorophores

The nucleotide is incorporated and the

fluorophore is cut

The free fluorophore generates a flash of light, which is detected by a fluorescence microscope

Page 38: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Characteristics

Third generation sequencing

A novel revolution expecially for →bioinformatics

The sequencing is continuous, washing is not necessary →much faster

PacBio allows to obtain sequences of several thousands of nucleotides (up to 20,000)

Page 39: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

PacBio ISSUES

Current issues are

the cost (10x more expensive than Illumina)

The read quality: single molecule sequencing means every mistake is recorded, and cannot be cancelled by the presence of thousands of parallel reactions

However these errors are random and can be overcome

Page 40: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

10–20 Gb of DNA sequences

1000$ per base kit(2 runs, all reagents included)

Library preparation can take <10 minutes

Read length of tens of thousands base… or even more

Can sequence genomic DNA, cDNA or even directly RNA!

Oxford Nanopore

Page 41: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Ions pass through the poreThis generates measurable current.When a molecule passes through, the current signal is disturbed

Each nucleotide produces a different specific current perturbation

Oxford Nanopore

Page 42: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Oxford Nanopore

A smaller protein is added on top on the pore, to unzip the DNA and have 1 strain pass through the pore

Nucleic acids pass through the pore as single strands

An adapter molecule slows down the flow in order to allow a clear recognition of each base

Page 43: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Oxford Nanopore

Reads up to 1 million bases!

Even smaller devices (in the future)

Page 44: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Rivoluzione dal punto di vista dell'analisi a valle

http://flxlexblog.wordpress.com/2013/10/01/developments-in-next-generation-sequencing-october-2013-edition/

Page 45: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

NEXT-GEN IS TRENDY

It is the new thing

It is powerful and cheap

It has uses in any biological system (From viruses to human genetics)

It is useful to answer a number of questions (De novo, mapping, transcriptomics)

Page 46: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

NEXT-GEN IS TRENDY

So everyone wants to use it

you just extract your DNA/RNA and send it to a sequencing company

And then, who will do the analysis?

Page 47: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

NEXT-GEN WORKFLOW

1. What is the goal?

2. Choose the right experimental setup

3. Choose the right sequencing technology

4. Data Analysis

Page 48: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

What is your goal?

NO WAY BACK!

What exactly is the problem you want to address?

Evaluate approaches used in the past

Consider new approaches

Consider future problems

Page 49: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

CHOOSE THE RIGHT TECHNOLOGY

de novo sequencing: 454, PacBio

Draft sequencing: Illumina, Iontorrent

Microbial communities: 454, Illumina

Transcriptomics: Illumina, Iontorrent

Page 50: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System
Page 51: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

DATA ANALYSIS

A basic next-gen experiment generates gigabytes of information

This is HIGH-THROUGHPUT!

Page 52: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

HIGH-TROUGHPUT TECHNOLOGIES

Technologies that generate too much data, that cannot be handled without computer assistance

Modelling

Shotgun proteomics

Network analysis

Structural biology

Machine learning

Page 53: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

HIGH-TROUGHPUT TECHNOLOGIES

Next-generation sequencing

Page 54: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

BIOINFORMATICS

Bioinformatics is the development and use of computer methods for the analysis of biological data

Bioinformatics becomes absolutely necessary with the increase of data load

Page 55: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

BIOINFORMATICS

Most bioinformatics is run on Linux

Page 56: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

SO WHAT IS UNIX?

Unix is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, developed in the 1970s at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others.

Page 57: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System
Page 58: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Full multitasking with protected memory

Very efficient virtual memory

Access controls and security

A rich set of small commands that do specific tasks well

Ability to combine commands to accomplish complicated tasks

A powerfully unified file system

Available on a wide variety of machines

Optimized for program development

UNIX Advantages

Page 59: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

The command line interface is user hostile

Commands often have cryptic names and give very littleresponse to tell the user what they are doing

To use Unix well, you need to understand some of themain design features

Richness of utilities (over 400 standard ones) oftenoverwhelms novices

Documentation often feels underwhelming and poor ofExamples

Expensive

UNIX Disadvantages

Page 60: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

UNIX LINUX→

Linux is a UNIX-like family of Operating Systems (OSs)

Each ”member” of the family has

different characteristics and comes

with different softwares and

graphic environments

Broadly, each distribution (a.k.a.

distro) is ”tuned” for a specific

task, to address a specific user or

designed for a specific kind of

devices

Most Unix advantages, plus it is FREE and User-friendly

Page 61: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System
Page 62: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Linux Distrosfor beginners:

Mint and Ubuntu, #1 and #2 most popular distributions

for a specific task:

e.g. BioLinux (bioinformatics), Scientific Linux (science in

general)and Ubuntu Studio (multimedia)

for a specific platform:

e.g. Mythbuntu (home theater PCs), Yellow Dog Linux (apple

machines), OpenWrt (routers)

Page 63: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

LINUX FOR BIOINFORMATICS

It requires more work than other operating systems

Why Linux?

Free and runs on most hardware

fully customizable

more efficient and stable

Why Linux for bioinformatics?

Supports multiple users in a controlled manner

Optimized for writing and executing scripts/commands

Features for handling massive amounts of files

Adopted by the scientific community

Page 64: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

LINUX – OPEN SOURCE

Why Linux? free and open software

Open-source software (OSS) is computer software with its source code made available with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose

Open-source software may be developed in a collaborative public manner

Page 65: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

LINUX

Why Linux? fully customizable

From the small details to the core functions

Page 66: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

LINUX

Linux servers are widely used for example by Microsoft →and Apple

Why Linux? more efficient and stable

As a bioinformatician, if you want to interact with your server quickly and well, you may find it easier if you use the same language

Page 67: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

is LINUX the only way to do bioinformatics?

ABSOLUTELY NO

However its characteristics make it optimal for most bioinformatic tasks

Supports multiple users in a controlled manner

Optimized for writing and executing scripts/commands

Features for handling massive amounts of files

Adopted by the scientific community

Many bioinformaticians use a Mac laptop to interact with a Linux server (MAC OS X is unix based)

Page 68: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Many Linux distros are as friendly as Windows

Give a try to Ubuntuhttps://www.ubuntu.com/download/desktop/try-ubuntu-before-you-install

You get to browse your files visually

internet browsers

Text processors

Skype

Even videogames

… and many things windows does not give you

Page 69: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Interacting with computers

Usually we interact with our computer through a Graphical User Interface (GUI)

We have folders, icons, etc...

This is very friendly for us, but very far from the ‘machine language’

Page 70: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

language

Using a language that is closer to ‘machine language’ gives us more power

Page 71: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

languages

Programming languages are more difficult and more powerful the more they are close to the machine language

Page 72: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Command Line Interface (CLI)

Also called the TERMINAL

The CLI is a more powerful way to interact with computers

Faster and automatable

BUT

More complex than GUI

Page 73: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Command Line Interface (CLI)

To use the CLI we need to write in a language that is closer to machine language (but still very very far from it)…

… instead of clicking with the mouse

So to start we need to know some basics

For today, let’s just see what is the folder structure

Page 74: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Folder structure The files in our computers are located in folders, with a tree-like structure

To navigate with GUI, we can click on folders

To navigate with CLI, we need to issue a basic command

Page 75: NEXT-GENERATION SEQUENCING AND …mbg.unipv.it/attach/1_next_gen_bioinformatics_1718.pdf · NEXT-GEN SEQUENCING TECHNOLOGIES • Roche/454 FLX • Applied Biosystems SOLiD System

Change Directory: cd

If I want to go from Documents to Subfolder1B-1:cd Folder1/Subfolder1B/Subfolder1B-1

From Subfolder1A to Folder 1cd ..

If I want to go from Documents to Folder1:

cd Folder1