39
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Embed Size (px)

Citation preview

Page 1: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Next Generation Sequencing and its data analysis challenges

Background

Alignment and Assembly

ApplicationsGenomeEpigenomeTranscriptome

Page 2: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

References

Cell 2013, 155:27Cell 2013, 155:39Annu. Rev. Plant Biol. 2009, 60: 305.Annu. Rev. Genomics Hum. Genet. 2009, 10:135.Curr. Opin. Biotechnology, 24:22.Nat. Biotech. 2009, 25:195.Nat. Methods. 2009, 6:S6.Nat. Rev. Genet. 2009, 10:669.Nat Rev Genet. 2010 Jan;11(1):31-46.Genomics. 2010 Jun;95(6):315-27.

This lecture is about the opportunities and challenges, not detailed statistical techniques. The materials are taken from some review articles.

Page 3: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Background

“Method of the year” 2007 by Nature Methods.The name:

“Next generation sequencing”“Deep sequencing”“High-throughput sequencing” “Second-generation sequencing”

The key characteristics:

Massive parallel sequencingamount of data from a single run ~ amount of data from the human genome project

The reads are short~ a few hundred bases / read

Page 4: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Background

Potential impact:

The “$1000 genome” will become reality very soon

Genome sequencing will become a regular medical procedure.

Personalized medicinePredictive medicineEthical issues

For statisticians:Data mining using hundreds of thousands of

genomesFinding rare SNPs/mutations associated with

diseasesNew methods to analyze

epigeomics/transcriptomics dataFinding interventions to improve life quality

Page 5: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Background

The companies use different techniques. We use Illumina’s as an example. (http://seqanswers.com/forums/showthread.php?t=21)

Page 6: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Background

Page 7: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Background

Page 8: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Background

Page 9: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Background

An incomplete list of some common platforms.

BMC Genomics 2012, 13:341

Page 10: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Background

Page 11: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Background

Advantages:

Fast and cost effective.No need to clone DNA fragments.

Drawbacks:

Short read length (platform dependent)Some platforms have trouble on identical

repeatsNon-uniform confidence in base calling in

reads. Data less reliable near the 3’ end of each read.

Page 12: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Background

What deep sequencing can do:

Page 13: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Background

Nat Methods. 2009 Nov;6(11 Suppl):S2-5.

Page 14: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Sequence the genome of a person? --- Alignment

Can rely on existing human genome as a blue print.

Align the short reads onto the existing human genome.

Need a few fold coverage to cover most regions.

Sequence a whole new genome? --- Assembly

Overlaps are required to construct the genome.The reads are short need ~30 fold coverage.If 3G data per run, need 30 runs for a new

genome similar to human size.

Alignment and Assembly

Page 15: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Alignment and Assembly

Hash table-based alignment. Similar to BLAST in principle.(1) Find potential locations:

(2) Local alignment.

Page 16: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Alignment and Assembly

From read to graph:

Page 17: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Alignment and Assembly

Page 18: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Alignment and Assembly

de Bruijn graph assembly

Red: read error.

Page 19: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Alignment and Assembly

de Bruijn graph assembly

Page 20: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Alignment and Assembly

de Bruijn graph assembly

Page 21: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Whole gnome/exome/transcriptome sequencing

Page 22: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Genomics

Whole genome sequencing detects all variants (SNP alleles, rare variants, mutations)

Could be associated with disease:

Rare variants (burden testing by collapsing by gene)

De novo mutations (need family tree)

Rare Mendelian disorders

Structural variants in cancer

Page 23: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Medical Genomics

Nature Reviews Genetics 11, 415

Example: Extreme-case sequencing to find rare variants associated with a disease.

Page 24: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

MedicalGenomics

Example:Cancergenome

Page 25: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Epigenomics

http://www.roadmapepigenomics.org/

Page 26: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

ChIP-Seq

ChIP-Seq.

Purpose: analyze which part of the DNA sequence bind to a certain protein.

Transcription factor(Regulome)

Modified histone (Epigenome)

Page 27: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Overall ChIP-Seq workflow

ChIP-Seq

Page 28: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Before deep sequencing, the same information was obtained by using array in the place of sequencing.

ChIP-Seq

Page 29: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

ChIP-Seq

Page 30: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Different kind of profiles in different applications.

Elongation

Silencing

ChIP-Seq

Page 31: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Example of active gene chromatin pattern found by ChIP-Seq.

Initiation site

Elongation

ChIP-Seq

Page 32: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

RNA-Seq

Page 33: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

RNA-Seq

Page 34: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Deep sequencing provides more information about each mRNA

RNA-Seq

Page 35: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Finding novel exons.

Splicing? (short read could be an issue.)

RNA-Seq

Page 36: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Gene expression profiling – to replace arrays?Exon-specific abundance.

RNA-Seq

Page 37: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Sequencin small RNA.

RNA-Seq

Page 38: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Quantification of miRNA and de novo detection of miRNAs

MicroRNA:21-23 in length.

Regulate gene expression by complementary binding .

Derived from non-coding RNAs that form stem-loop structure.

RNA-Seq

Page 39: Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

Directly probe mRNA targets of miRNA.

RNA-Seq