Upload
cynthia-lane
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Next Generation Sequencing and its data analysis challenges
Background
Alignment and Assembly
ApplicationsGenomeEpigenomeTranscriptome
References
Cell 2013, 155:27Cell 2013, 155:39Annu. Rev. Plant Biol. 2009, 60: 305.Annu. Rev. Genomics Hum. Genet. 2009, 10:135.Curr. Opin. Biotechnology, 24:22.Nat. Biotech. 2009, 25:195.Nat. Methods. 2009, 6:S6.Nat. Rev. Genet. 2009, 10:669.Nat Rev Genet. 2010 Jan;11(1):31-46.Genomics. 2010 Jun;95(6):315-27.
This lecture is about the opportunities and challenges, not detailed statistical techniques. The materials are taken from some review articles.
Background
“Method of the year” 2007 by Nature Methods.The name:
“Next generation sequencing”“Deep sequencing”“High-throughput sequencing” “Second-generation sequencing”
The key characteristics:
Massive parallel sequencingamount of data from a single run ~ amount of data from the human genome project
The reads are short~ a few hundred bases / read
Background
Potential impact:
The “$1000 genome” will become reality very soon
Genome sequencing will become a regular medical procedure.
Personalized medicinePredictive medicineEthical issues
For statisticians:Data mining using hundreds of thousands of
genomesFinding rare SNPs/mutations associated with
diseasesNew methods to analyze
epigeomics/transcriptomics dataFinding interventions to improve life quality
Background
The companies use different techniques. We use Illumina’s as an example. (http://seqanswers.com/forums/showthread.php?t=21)
Background
Background
Background
Background
An incomplete list of some common platforms.
BMC Genomics 2012, 13:341
Background
Background
Advantages:
Fast and cost effective.No need to clone DNA fragments.
Drawbacks:
Short read length (platform dependent)Some platforms have trouble on identical
repeatsNon-uniform confidence in base calling in
reads. Data less reliable near the 3’ end of each read.
Background
What deep sequencing can do:
Background
Nat Methods. 2009 Nov;6(11 Suppl):S2-5.
Sequence the genome of a person? --- Alignment
Can rely on existing human genome as a blue print.
Align the short reads onto the existing human genome.
Need a few fold coverage to cover most regions.
Sequence a whole new genome? --- Assembly
Overlaps are required to construct the genome.The reads are short need ~30 fold coverage.If 3G data per run, need 30 runs for a new
genome similar to human size.
Alignment and Assembly
Alignment and Assembly
Hash table-based alignment. Similar to BLAST in principle.(1) Find potential locations:
(2) Local alignment.
Alignment and Assembly
From read to graph:
Alignment and Assembly
Alignment and Assembly
de Bruijn graph assembly
Red: read error.
Alignment and Assembly
de Bruijn graph assembly
Alignment and Assembly
de Bruijn graph assembly
Whole gnome/exome/transcriptome sequencing
Genomics
Whole genome sequencing detects all variants (SNP alleles, rare variants, mutations)
Could be associated with disease:
Rare variants (burden testing by collapsing by gene)
De novo mutations (need family tree)
Rare Mendelian disorders
Structural variants in cancer
Medical Genomics
Nature Reviews Genetics 11, 415
Example: Extreme-case sequencing to find rare variants associated with a disease.
MedicalGenomics
Example:Cancergenome
Epigenomics
http://www.roadmapepigenomics.org/
ChIP-Seq
ChIP-Seq.
Purpose: analyze which part of the DNA sequence bind to a certain protein.
Transcription factor(Regulome)
Modified histone (Epigenome)
Overall ChIP-Seq workflow
ChIP-Seq
Before deep sequencing, the same information was obtained by using array in the place of sequencing.
ChIP-Seq
ChIP-Seq
Different kind of profiles in different applications.
Elongation
Silencing
ChIP-Seq
Example of active gene chromatin pattern found by ChIP-Seq.
Initiation site
Elongation
ChIP-Seq
RNA-Seq
RNA-Seq
Deep sequencing provides more information about each mRNA
RNA-Seq
Finding novel exons.
Splicing? (short read could be an issue.)
RNA-Seq
Gene expression profiling – to replace arrays?Exon-specific abundance.
RNA-Seq
Sequencin small RNA.
RNA-Seq
Quantification of miRNA and de novo detection of miRNAs
MicroRNA:21-23 in length.
Regulate gene expression by complementary binding .
Derived from non-coding RNAs that form stem-loop structure.
RNA-Seq
Directly probe mRNA targets of miRNA.
RNA-Seq