Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
RNA-‐seq Gene Expression Analysis
Tzu L. Phang Ph.D. Robert Stearman Ph.D. Michael Edwards Ph.D.
University of Colorado Denver 2014 AACR Workshop, Snowmass CO
What is gene expression?
“Gene expression is the process by which informaSon from a gene is used in the synthesis of a funcSonal gene product”
The Central Dogma
Transcriptome Proteomic
Agenda
• Import BAM file • What is Next GeneraSon Sequencing (NGS)? • NGS Usages: RNA-‐seq and ChIP-‐seq • NGS File Format and Size • Mapping Phred Quality Score • Demo: Using SeqMonk for RNA-‐seq and ChIP-‐seq Analysis
Installing SeqMonk
h]p://www.bioinformaScs.babraham.ac.uk/projects/seqmonk/
SeqMonk: First Look
List Panel
Chromosome Panel
Track Panel
Quick Assess Panel
Sample BAM Files • ABC_DHL2.bam • ABC_Ly10.bam • ABC_Ly3.bam • ABC_U2932.bam • GCB_DHL10.bam • GCB_DHL4.bam • GCB_DHL6.bam • GCB_Ly7.bam • STAT3 ChIPSeq Genes.txt
Import BAM files
Next GeneraSon Sequencing
h]p://www.youtube.com/watch?v=77r5p8IBwJk
How RNA-seq works
Figure from Wang et. al, RNA-‐Seq: a revolu=onary tool for transcriptomics, Nat. Rev. GeneScs 10, 57-‐63, 2009).
Next generaSon sequencing (NGS)
Sample preparaSon
How ChIP-seq works
File System
unknown:5:1:2:836#0/1:CATACAAGTTGTTTGTACTATAGNTGTTTTTGAATT:aabaaaa^abaaba^_]_aaaXPD\^_aaa`Y]_aa!unknown:5:1:2:717#0/1:TCTGTTCCAGATTCTAAGGGCATNGTCTTTTTGAAT:aa^]]`\_^[Y_`^aZP^VZV[SDLZ^aa__^^\Ya!unknown:5:1:2:188#0/1:TAAGAAGAAAGATGCATAGGTACNATATTTTTGAAT:a``Z[^Y^`\\\^[\^][WNTWNDS_[^_^^[OWY_!unknown:5:1:2:1262#0/1:CACTTACAAACAAGGAATGTTGGNCGGTTTTTGAAT:a`ababaabaaaa_``aa``_ULDXZ_^aaa`O_aa!unknown:5:1:2:1046#0/1:CTAAGATGGCCTAAGAGTAGACTNACTTTTTTGAAT:abb`Xa`Z_aabaaa`]__Z^`\D\`aaaaaa^aab!
!
@ILLUMINA-‐545855_0001:4:100:743:1210#0 TAACATGTGTCATATGTCCCAGGATGTC +ILLUMINA-‐545855_0001:4:100:743:1210#0 ab^aaaa_a_aaaa`a^abaaaa``a_a
Data Structure
FASTQ format
Quality Control of sequences
• The quality scores are the only measure of confidence
• QualiSes usually fall with length where trimming is needed to remove
Phred QualiSes
• Developed by Phil Green’s group at the University of Washington in the 1990’s
• AutomaScally processes sequence chromatogram files – Reports sequence and associated qualiSes – Introduced concept of phred quality values
James H. Thomas, University of Washington
James H. Thomas, University of Washington
FASTQ QC VisualizaSon Per base sequence quality
h]p://www.bioinformaScs.babraham.ac.uk/projects/fastqc/
Our Dataset
RNA-‐seq Analysis Workflow
From FASTQ to SAM/BAM
galaxyproject.org
h]ps://usegalaxy.org
File Format
SCARF
FASTQ
SAM
BAM
VCF
GTF
BED WIG
Single-End
Paired-End
PILEUP
Mapping
Input for Visualization
Tools
QC Visualization
5.2 GB
3.3 GB
4.5 GB
696 MB
NOW A DEMONSTRATION
Arer BAM Import
RNA-‐seq Analysis Workflow
Define Probes & QuanStaSon
RNA-‐seq Analysis Workflow
Why normalizaSon?
• Remove systematic errors introduced in labeling, hybridization and scanning procedures
• Correct these errors while preserve biological variability / information
A different look … Technical rep
licate diffe
rence
Average Intensity Values
To normalize or not to …
ChIP-‐seq Analysis Strategy
ChIP-‐seq Analysis Caveats
ChIP-‐seq Analysis Workflow