Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly...

Preview:

Citation preview

Next Generation Sequencing and its data analysis challenges

Background

Alignment and Assembly

ApplicationsGenomeEpigenomeTranscriptome

References

Cell 2013, 155:27Cell 2013, 155:39Annu. Rev. Plant Biol. 2009, 60: 305.Annu. Rev. Genomics Hum. Genet. 2009, 10:135.Curr. Opin. Biotechnology, 24:22.Nat. Biotech. 2009, 25:195.Nat. Methods. 2009, 6:S6.Nat. Rev. Genet. 2009, 10:669.Nat Rev Genet. 2010 Jan;11(1):31-46.Genomics. 2010 Jun;95(6):315-27.

This lecture is about the opportunities and challenges, not detailed statistical techniques. The materials are taken from some review articles.

Background

“Method of the year” 2007 by Nature Methods.The name:

“Next generation sequencing”“Deep sequencing”“High-throughput sequencing” “Second-generation sequencing”

The key characteristics:

Massive parallel sequencingamount of data from a single run ~ amount of data from the human genome project

The reads are short~ a few hundred bases / read

Background

Potential impact:

The “$1000 genome” will become reality very soon

Genome sequencing will become a regular medical procedure.

Personalized medicinePredictive medicineEthical issues

For statisticians:Data mining using hundreds of thousands of

genomesFinding rare SNPs/mutations associated with

diseasesNew methods to analyze

epigeomics/transcriptomics dataFinding interventions to improve life quality

Background

The companies use different techniques. We use Illumina’s as an example. (http://seqanswers.com/forums/showthread.php?t=21)

Background

Background

Background

Background

An incomplete list of some common platforms.

BMC Genomics 2012, 13:341

Background

Background

Advantages:

Fast and cost effective.No need to clone DNA fragments.

Drawbacks:

Short read length (platform dependent)Some platforms have trouble on identical

repeatsNon-uniform confidence in base calling in

reads. Data less reliable near the 3’ end of each read.

Background

What deep sequencing can do:

Background

Nat Methods. 2009 Nov;6(11 Suppl):S2-5.

Sequence the genome of a person? --- Alignment

Can rely on existing human genome as a blue print.

Align the short reads onto the existing human genome.

Need a few fold coverage to cover most regions.

Sequence a whole new genome? --- Assembly

Overlaps are required to construct the genome.The reads are short need ~30 fold coverage.If 3G data per run, need 30 runs for a new

genome similar to human size.

Alignment and Assembly

Alignment and Assembly

Hash table-based alignment. Similar to BLAST in principle.(1) Find potential locations:

(2) Local alignment.

Alignment and Assembly

From read to graph:

Alignment and Assembly

Alignment and Assembly

de Bruijn graph assembly

Red: read error.

Alignment and Assembly

de Bruijn graph assembly

Alignment and Assembly

de Bruijn graph assembly

Whole gnome/exome/transcriptome sequencing

Genomics

Whole genome sequencing detects all variants (SNP alleles, rare variants, mutations)

Could be associated with disease:

Rare variants (burden testing by collapsing by gene)

De novo mutations (need family tree)

Rare Mendelian disorders

Structural variants in cancer

Medical Genomics

Nature Reviews Genetics 11, 415

Example: Extreme-case sequencing to find rare variants associated with a disease.

MedicalGenomics

Example:Cancergenome

Epigenomics

http://www.roadmapepigenomics.org/

ChIP-Seq

ChIP-Seq.

Purpose: analyze which part of the DNA sequence bind to a certain protein.

Transcription factor(Regulome)

Modified histone (Epigenome)

Overall ChIP-Seq workflow

ChIP-Seq

Before deep sequencing, the same information was obtained by using array in the place of sequencing.

ChIP-Seq

ChIP-Seq

Different kind of profiles in different applications.

Elongation

Silencing

ChIP-Seq

Example of active gene chromatin pattern found by ChIP-Seq.

Initiation site

Elongation

ChIP-Seq

RNA-Seq

RNA-Seq

Deep sequencing provides more information about each mRNA

RNA-Seq

Finding novel exons.

Splicing? (short read could be an issue.)

RNA-Seq

Gene expression profiling – to replace arrays?Exon-specific abundance.

RNA-Seq

Sequencin small RNA.

RNA-Seq

Quantification of miRNA and de novo detection of miRNAs

MicroRNA:21-23 in length.

Regulate gene expression by complementary binding .

Derived from non-coding RNAs that form stem-loop structure.

RNA-Seq

Directly probe mRNA targets of miRNA.

RNA-Seq

Recommended