Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
S–MART:What can I do with all this RNA-Seq data?
Matthias ZytnickiURGI — INRA
ALIMENTATION
AGRICULTURE
ENVIRONNEMENT
Introduction
from The Economist, Aug. 2010
S–MART 10/26/10 Matthias Zytnicki 2 / 18
Sequencers around the world
http://pathogenomics.bham.ac.uk/hts/
S–MART 10/26/10 Matthias Zytnicki 3 / 18
Applications
• pharmacology: correlating drug response with genomevariation
• metagenomics: tag sequencing
• genetics: GWAS of families of four
• epigenetics: replication timing and histone modification
• genomics: gene fusions, variant detection with exon capture
S–MART 10/26/10 Matthias Zytnicki 4 / 18
Current problem
Differences between big and small labs
big labs may have
• their own sequencers• their own clusters• their own storage solutions• their own bioinformaticians
small labs may have
• nothing
numberof labs
number ofsequencers
S–MART 10/26/10 Matthias Zytnicki 5 / 18
Comparison with tiling arrays
from Wang et al., 2009
S–MART 10/26/10 Matthias Zytnicki 6 / 18
Differential expression
1 dot: # reads overlapping with 1 gene in each condition.
S–MART 10/26/10 Matthias Zytnicki 7 / 18
Count normalization problem
• each sample: 1 million reads
• 1 gene: 0 / 500,000
• other genes: expression unchanged
⇒ a gene 2000 / 1000 is not differentially expressed!
• use household genes as reference
• use averagely expressed genes
S–MART 10/26/10 Matthias Zytnicki 8 / 18
Count normalization problem
• each sample: 1 million reads
• 1 gene: 0 / 500,000
• other genes: expression unchanged
⇒ a gene 2000 / 1000 is not differentially expressed!
• use household genes as reference
• use averagely expressed genes
S–MART 10/26/10 Matthias Zytnicki 8 / 18
Count normalization problem
• each sample: 1 million reads
• 1 gene: 0 / 500,000
• other genes: expression unchanged
⇒ a gene 2000 / 1000 is not differentially expressed!
• use household genes as reference
• use averagely expressed genes
S–MART 10/26/10 Matthias Zytnicki 8 / 18
Count normalization problem
• each sample: 1 million reads
• 1 gene: 0 / 500,000
• other genes: expression unchanged
⇒ a gene 2000 / 1000 is not differentially expressed!
• use household genes as reference
• use averagely expressed genes
S–MART 10/26/10 Matthias Zytnicki 8 / 18
Size-dependant normalization problem
genes
sample 1
sample 2
S–MART 10/26/10 Matthias Zytnicki 9 / 18
IntroductionData:
• 1 RNA–Seq sample of wild type
• 1 RNA–Seq sample of mutant
• Get me all the genes which show differential expression.
• Use sliding windows instead.
• Actually, I want to have a bird’s eye view on the transcriptionthroughout the genome.
• Hey! I saw some paper which shows some nice dot plots fordifferential expression!
⇒ Better if the biologist can perform the work him/her-self.
⇒ There is no pre-defined pipe–line.
⇒ Few lanes are usually sufficient.
⇒ Use S–MART!
S–MART 10/26/10 Matthias Zytnicki 10 / 18
IntroductionData:
• 1 RNA–Seq sample of wild type
• 1 RNA–Seq sample of mutant
• Get me all the genes which show differential expression.
• Use sliding windows instead.
• Actually, I want to have a bird’s eye view on the transcriptionthroughout the genome.
• Hey! I saw some paper which shows some nice dot plots fordifferential expression!
⇒ Better if the biologist can perform the work him/her-self.
⇒ There is no pre-defined pipe–line.
⇒ Few lanes are usually sufficient.
⇒ Use S–MART!
S–MART 10/26/10 Matthias Zytnicki 10 / 18
IntroductionData:
• 1 RNA–Seq sample of wild type
• 1 RNA–Seq sample of mutant
• Get me all the genes which show differential expression.
• Use sliding windows instead.
• Actually, I want to have a bird’s eye view on the transcriptionthroughout the genome.
• Hey! I saw some paper which shows some nice dot plots fordifferential expression!
⇒ Better if the biologist can perform the work him/her-self.
⇒ There is no pre-defined pipe–line.
⇒ Few lanes are usually sufficient.
⇒ Use S–MART!
S–MART 10/26/10 Matthias Zytnicki 10 / 18
IntroductionData:
• 1 RNA–Seq sample of wild type
• 1 RNA–Seq sample of mutant
• Get me all the genes which show differential expression.
• Use sliding windows instead.
• Actually, I want to have a bird’s eye view on the transcriptionthroughout the genome.
• Hey! I saw some paper which shows some nice dot plots fordifferential expression!
⇒ Better if the biologist can perform the work him/her-self.
⇒ There is no pre-defined pipe–line.
⇒ Few lanes are usually sufficient.
⇒ Use S–MART!
S–MART 10/26/10 Matthias Zytnicki 10 / 18
IntroductionData:
• 1 RNA–Seq sample of wild type
• 1 RNA–Seq sample of mutant
• Get me all the genes which show differential expression.
• Use sliding windows instead.
• Actually, I want to have a bird’s eye view on the transcriptionthroughout the genome.
• Hey! I saw some paper which shows some nice dot plots fordifferential expression!
⇒ Better if the biologist can perform the work him/her-self.
⇒ There is no pre-defined pipe–line.
⇒ Few lanes are usually sufficient.
⇒ Use S–MART!
S–MART 10/26/10 Matthias Zytnicki 10 / 18
IntroductionData:
• 1 RNA–Seq sample of wild type
• 1 RNA–Seq sample of mutant
• Get me all the genes which show differential expression.
• Use sliding windows instead.
• Actually, I want to have a bird’s eye view on the transcriptionthroughout the genome.
• Hey! I saw some paper which shows some nice dot plots fordifferential expression!
⇒ Better if the biologist can perform the work him/her-self.
⇒ There is no pre-defined pipe–line.
⇒ Few lanes are usually sufficient.
⇒ Use S–MART!
S–MART 10/26/10 Matthias Zytnicki 10 / 18
Usual pipe–line
• First step: align on a reference genome(with any tool)
• Result: genomic coordinates.
• S–MART :• is a set of independant tools• works on a standard PC• can be installed and used easily• uses nested bins and SQL indices
sequencessequencer alignment genomiccoordinates
S–MART 10/26/10 Matthias Zytnicki 11 / 18
Data manipulation
• use several mapping tools
• remove reads w.r.t. a reference set
• find the coverage w.r.t. a reference set
• cluster by sliding windows
• find transcription on both strands
Mosaik
BlatFasta coordinates
Supported tools:
Blast, Blat, BWA, Exonerate, MAQ, Mosaik, Nucmer, RMap, SeqMap,
Shrimp, SOAP, . . .
S–MART 10/26/10 Matthias Zytnicki 12 / 18
Data manipulation
• use several mapping tools
• remove reads w.r.t. a reference set
• find the coverage w.r.t. a reference set
• cluster by sliding windows
• find transcription on both strands
reads
result
tRNA
S–MART 10/26/10 Matthias Zytnicki 12 / 18
Data manipulation
• use several mapping tools
• remove reads w.r.t. a reference set
• find the coverage w.r.t. a reference set
• cluster by sliding windows
• find transcription on both strands
reads
refSeq
result
S–MART 10/26/10 Matthias Zytnicki 12 / 18
Data manipulation
• use several mapping tools
• remove reads w.r.t. a reference set
• find the coverage w.r.t. a reference set
• cluster by sliding windows
• find transcription on both strands
40
4
reads
number
S–MART 10/26/10 Matthias Zytnicki 12 / 18
Data manipulation
• use several mapping tools
• remove reads w.r.t. a reference set
• find the coverage w.r.t. a reference set
• cluster by sliding windows
• find transcription on both strands
reads
result
S–MART 10/26/10 Matthias Zytnicki 12 / 18
Data visualization• read size distribution
• nucleotidic distribution• density on the chromosomes• distance with respect to genes
S–MART 10/26/10 Matthias Zytnicki 13 / 18
Data visualization• read size distribution• nucleotidic distribution
• density on the chromosomes• distance with respect to genes
S–MART 10/26/10 Matthias Zytnicki 13 / 18
Data visualization• read size distribution
• nucleotidic distribution
• density on the chromosomes
• distance with respect to genes
S–MART 10/26/10 Matthias Zytnicki 13 / 18
Data visualization• read size distribution• nucleotidic distribution• density on the chromosomes• distance with respect to genes
S–MART 10/26/10 Matthias Zytnicki 13 / 18
Differential expression
S–MART can find differentially expressed regions which can begenes, TEs, miRNAs, sliding windows, etc.Uses Fisher’s exact test for each region.
** –
sample 1
sample 2
result
S–MART 10/26/10 Matthias Zytnicki 14 / 18
NormalizationsUse dot plot to count the number of reads per gene.• no normalization
• normalization w.r.t. the number of reads• normalization w.r.t. the interquartile• # number of reads per kb• FDR of 5%
Spearman rho: 0.558867S–MART 10/26/10 Matthias Zytnicki 15 / 18
NormalizationsUse dot plot to count the number of reads per gene.• no normalization• normalization w.r.t. the number of reads
• normalization w.r.t. the interquartile• # number of reads per kb• FDR of 5%
Spearman rho: 0.697121S–MART 10/26/10 Matthias Zytnicki 15 / 18
NormalizationsUse dot plot to count the number of reads per gene.• no normalization• normalization w.r.t. the number of reads• normalization w.r.t. the interquartile
• # number of reads per kb• FDR of 5%
Spearman rho: 0.697153S–MART 10/26/10 Matthias Zytnicki 15 / 18
NormalizationsUse dot plot to count the number of reads per gene.• no normalization• normalization w.r.t. the number of reads• normalization w.r.t. the interquartile• # number of reads per kb
• FDR of 5%
Spearman rho: 0.752267S–MART 10/26/10 Matthias Zytnicki 15 / 18
NormalizationsUse dot plot to count the number of reads per gene.• no normalization• normalization w.r.t. the number of reads• normalization w.r.t. the interquartile• # number of reads per kb• FDR of 5%
Spearman rho: 0.752267S–MART 10/26/10 Matthias Zytnicki 15 / 18
Pipe-lines
The user can chain all the S–MART tools as he/she likes
windowsby sliding
clusterize
sample 2windows
by sliding
clusterize
sample 1
mergeoutput
keep loci
with p-value
10−4
Differential expression with sliding windows, plotted.
S–MART 10/26/10 Matthias Zytnicki 16 / 18
Conclusions
• S–MART: a tool for RNA-Seq high-throughput sequencingdata manipulation and visualization.
• Especially useful for detecting differential expression.
• For the labs with few bio–informaticiens.
• Download it athttp://urgi.versailles.inra.fr/index.php/urgi/
Tools/S-MART
S–MART 10/26/10 Matthias Zytnicki 17 / 18
Acknowledgements
Computer science
Hadi Quesneville URGIURGI lab
Fly
Dominique Anxolabehere IJMDanielle Nouaud IJMChantale Vaury GReDSophie Desset GReDSilke Jensen GReD
Arabidopsis
Herve Vaucheret IJPBValerie Gaudin IJPB
Sponsors
S–MART 10/26/10 Matthias Zytnicki 18 / 18