Upload
yaoyu-wang
View
972
Download
2
Embed Size (px)
Citation preview
Comparing and intercoverting expression profiles from Microarrays to RNAseq
Yaoyu E. Wang, Ph.DCenter for Cancer Computational Biology, DFCI
SPECSII webinarJune 05, 2013
Transcriptome Profiling by High Throughput Technologies
- Transcriptome profiling represents a static gene expression state of a biological sample across the genome
- Allows for direct genomic comparisons with multiple samples to determine genes that exhibit differential expression in different state (i.e. normal vs. tumor)
- Allows for hypothesis generation on molecular abnormalities and mechanisms that may contribute to the tumor phenotype
- Provides information on molecular subtypes, the development of prognostic and predictive molecular signatures
- Two main technologies: a. Microarrayb. RNA-Sequencing (RNASeq) using next
generation sequencing
Microarray and RNASeq Experimental Flows
Affymetrix GeneChip
scanner
Microarray and RNASeq Experimental Flows
Blencowe B J et al. Genes Dev. 2009;23:1379-1386
Illumina HiSeq
The run is finished. How are sequence files created?
.bcl files
CASAVA processing•Demultiplexing•Fastq file generation•Sequencing filtering
Raw files containing base calls and quality scores
Illumina defined quality filters
Split into Project and Sample Folders
Jones_Lab
ChIP_A ChIP-B
Marcus_Lab
RNA-SeqA RNA-SeqB RNA-SeqC
Williams_Lab
Exome1 Exome2
Fastq Files Fastq Files Fastq Files
Possibilities for RNASeq Analysis
Haas & Zody. Nature Biotechnology 28, 421–423 (2010)
Using known annotations
And compare to known annotations
•Differential Expression•Differential Isoform Abundance•RNA editing•SNP, indel detection
Microarray and RNASeq Comparison
Technology RNASeq MicroarrayHigh run-to run reproducibility Yes YesDynamic Range Comparable to actual transcript abundance >8000-fold Hundred
foldAble to detect alternative splice site and novel isoforms Yes No
De novo analysis of samples without reference genome Yes No
Multiplexing Samples in one run Yes NoRequired amount of total RNA >100 ng ~1 ug Re-analyzable data Yes No
Microarray and RNASeq Comparison
Technology RNASeq MicroarrayHeterogeneity of read coverage across an expressed region Yes No
Well understood sources of experimental bias No Yes
Data portable on a flush drive (~4G) No YesData is analyzable by any PC No YesCheaper cost per sample No(?) Yes(?)
Why microarray data should not be ignored?
RNA-Seq Experiment
GEO Database
Comparing Expression Profiles from Microarrays to RNASeq
White paper, Illumina
Comparing Expression Profiles from Microarrays to RNASeq
White paper, Illumina
Comparing Expression Profiles from Microarrays to RNASeq
n=7532 n=4537
Mooney M, PloSOne (2013)
RNA-Seq vs. Microarray in Lymphoma of Canis familiaris
10 Lymphoma (3 T-cell, 7 B-cell)4 Normal lymph node
Total RNA
PE100 run50-100 million mapped reads
Compare 15,092 annotated genes on chip
RNA-Seq vs. Microarray in B-Cell Lymphomas of Canis familiaris
Mooney M, PloSOne (2013)
T
NB
r=0.6; p<10-15
Comparison of transcriptome profile from RNA-Seq and Tiling arrays
c. elegans
Biological Replicates forL2 and YA stages
Affy Tiling Arrays* Illumina RNASeq
Agarwal, BMC Genomics (2010)
* Covers whole c.elegans genome
Differential Expression genes between the L2 and YA stage
Agarwal, BMC Genomics (2010)
Comparison of transcriptome profile from RNA-Seq and tiling arrays
RNA-Seq and tiling arrays
Tiling Array
Microarray Maximum Sensitivity
RNASeq 11-plex
RNASeq 6-plex
Agarwal, BMC Genomics (2010)
Comparison of transcriptome profile from RNA-Seq and tiling arrays using qPCR as gold standard
Microarray and RNASeq Comparison
Microarray is cheaper?
RNASeq v.s. Microarray Cost Analysis
Per Sample Microarray Illumina HiSeq1 per Chip/Lane $670 $4,010.00
2 plex NA $2,097.504-plex NA $1,141.256-plex NA $822.508-plex NA $663.13
6-plex
11-plex
RNASeq v.s. Microarray Cost Analysis
Per Sample Microarray Illumina HiSeq1 per Chip/Lane $670 $4,010.00
2 plex NA $2,097.504-plex NA $1,141.256-plex NA $822.508-plex NA $663.13
Microarray and RNASeq Comparison
RNASeq is cheaper?
Microarray v.s. RNASeq: Big Data Challenge
Data Per SampleTime to
download 1 Sample
Time to download 100 samples
Cost to Store on the Cloud per Month
RNASeq 30-65GB 1 Hr 6 days $270 Microarray 30MB 5 second 8 minutes $0.30
http://www.ncbi.nlm.nih.gov/genbank/statistics
Microarray v.s. RNASeq: Bioinformatics Challenge
-Application with User Interface RNA-Seq analysis (i.e. Galaxy) can only handle very few samples
-Knowledge of Linux server, scripting language, programming language is absolutely REQUIRED
-Lack of detailed understanding in NGS technology and data leads to diverse bioinformatics tools with different characteristics
Law WC , Voom!, Bionconductor (2013)
Bioinformatics will become the single largest cost in sequencing
Conclusions
The answer is Yes- Transcriptome profiles generated by microarray
and RNASeq are in strongly concordance- Microarray data generated in the last decades is
durable- RNASeq is it offers more a lot more biological
information than microarray that is re-analyzable- NGS is getting cheaper
However, the devil is in the data- NGS data is a lot more expensive to store and
analyze- Specialized computing infrastructure and
personnel are required to take advantage of the information from NGS data