61
Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center [email protected] @JasonWilliamsNY DNALC Live RNA-Seq with DNA Subway Part III

RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Jason WilliamsCold Spring Harbor Laboratory, DNA Learning Center

[email protected]@JasonWilliamsNY

DNALC Live

RNA-Seq with DNA SubwayPart III

Page 2: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

DNALC LiveThis is an experiment, give us feedback

on what you would like to see!

Page 3: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

dnalc.cshl.edu

DNALC Website and Social Media

dnalc.cshl.edu/dnalc-live

Page 4: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

youtube.com/DNALearningCenter

DNALC Website and Social Media

facebook.com/cshldnalc

@dnalc

@dna_learning_center

Page 5: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Who is this course for?

• Audience(s): • Undergraduate biology 200 level and up • (advanced AP Bio/graduate)

• Format: 3 sessions (1 per week); ~ 45 minutes each

• Exercises: Follow along with our online bioinformatics tool DNA Subway

• Learning resources: Slides and resource sheets available

Page 6: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Course Learning Goals

• Understand the rationale of an RNA-Seq experiment and its design

• Understand how we obtain DNA sequence and access its quality

• Use DNA Subway (FastQC/FastX) to QC sequence data

• Use DNA Subway (Kallisto) to (pseudo)align reads

• Use DNA Subway (Sleuth) to explore RNA-Seq results

Page 7: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Lab Setup• We will be using DNA Subway – You can get a free account at cyverse.org

(required)

Page 8: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

RNA-Seq with DNA SubwayPart III

(differential abundance/expression)

Page 9: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Steps for today’s session

• Review our progress so far

• Learn about differential abundance

• Visualize and explore our results

Page 10: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Review of RNA-Seq

Page 11: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

What is RNA-Seq? - measuring the transcriptome

• RNA-Seq allows us to measure the transcriptome – take an account of all transcription occurring in a cell/tissue

• We use the abundance of an RNA transcript as a proxy for the activity of some cellular process (e.g. protein synthesis, regulatory activity)

• We analyze these data to compare samples (e.g. cancerous vs. non-cancerous)

Page 12: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Key Concept: Variation vs. Difference

Page 13: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Spot the difference – biological variation

Photo Credit:https://www.quora.com/What-are-Overlapping-Bell-Curves-and-how-do-they-affect-Quora-questions-and-answers

Page 14: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Introduction to our data set

Page 15: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

RNA-Seq of hNPC – Zika Virus

Photo credit:https://www.sigmaaldrich.com/life-science/stem-cell-biology/neural-stem-cell-biology.htmlhttps://en.wikipedia.org/wiki/Zika_virus#/media/File:Zika-chain-colored.png

Zika Virus

Page 16: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Working on DNA Subway Green Line

Page 17: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Working on DNA Subway Green Line

Page 18: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Key Concept: Sequence Quality

Page 19: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Phred scores…

Phred Score Error (bases miscalled) Accuracy10 1 in 10 90%20 1 in 100 99%30 1 in 1,000 99.9%40 1 in 10,000 99.99%50 1 in 100,000 99.999%

Page 20: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Key Concept: Read Alignment

Page 21: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Intuition: The more reads we observe from a given “gene” the more “active” that gene is

Page 22: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Counting reads

Page 23: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Counting reads

Page 24: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

RNA-Seq with Kallisto

Page 25: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Problem: A transcriptome (like a genome) contains thousands of transcripts. How will we

match sequence reads with transcripts?

Page 26: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Kallisto – Pseudoalignment

Photo credit:https://www.nature.com/articles/nbt.3519

Page 27: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Kallisto – Pseudoalignment

Photo credit:http://mcb112.org/w02/w02-lecture.html

Page 28: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Lab: Pseudoalignment with Kallisto

Page 29: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Lab: Kallisto

Page 30: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Lab: Kallisto

See DNA Subway Guide (Green Line) on learning.cyverse.org

Page 31: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Kallisto results

Page 32: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Kallisto results

• target_id: Identifier for the transcript (from Ensembl)

Page 33: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Kallisto results

Page 34: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Kallisto results

• target_id: Identifier for the transcript (from Ensembl)

• length: length (nucleotides) of transcript exons

Page 35: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Kallisto results

• target_id: Identifier for the transcript (from Ensembl)

• length: length (nucleotides) of transcript exons

• eff_length: length of transcript that was sampled*

*In the original sequencing library, we rarely sample whole entire transcripts, this number accounts for the fragment length of the library

Page 36: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Kallisto results

• target_id: Identifier for the transcript (from Ensembl)

• length: length (nucleotides) of transcript exons

• eff_length: length of transcript that was sampled*

• est_counts: The estimated number of reads that have mapped to the transcript

*In the original sequencing library, we rarely sample whole entire transcripts, this number accounts for the fragment length of the library

Page 37: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Key Concept: Normalization(Warning – illustrative “toy” models ahead)

Page 38: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Transcripts per million

• tpm (transcripts per million) normalized counts based on the length of the transcript and total number of sequence reads

Page 39: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Transcripts per million

• tpm (transcripts per million) normalized counts based on the length of the transcript and total number of sequence reads

Transcripts per million ≡ 𝐴 # !∑($)

# 106

Page 40: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Transcripts per million

• tpm (transcripts per million) normalized counts based on the length of the transcript and total number of sequence reads

Transcripts per million ≡ 𝐴 # !∑($)

# 106

𝐴 =𝑡𝑜𝑡𝑎𝑙 𝑟𝑒𝑎𝑑𝑠 𝑚𝑎𝑝𝑝𝑒𝑑 𝑡𝑜 𝑔𝑒𝑛𝑒 / 103

𝑔𝑒𝑛𝑒 𝑙𝑒𝑛𝑔𝑡ℎ (𝑏𝑝)

Page 41: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Normalization – gene length

Gene A Gene B

Which is longer (bp)?

Page 42: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Normalization – gene length

Gene A(300bp)

Gene B (100bp)

Which has more reads?

Page 43: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Normalization – gene lengthWhich has more reads?

Gene A(300bp)

Gene B (100bp)

Page 44: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Normalization – gene length

1627Gene A(300bp)

Gene B (100bp)

Page 45: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Normalization – gene length

Gene A(300bp)

27/300 = 0.09

Gene B (100bp)

16/100 = 0.16

1627

Page 46: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Normalization – read depth

Flow cell 1 Flow cell 2Photo credithttps://www.illumina.com/company/news-center/multimedia-images.html

A

B

C

D

Which gene is most highly expressed?

Page 47: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Normalization – read depth

Photo credithttps://www.illumina.com/company/news-center/multimedia-images.html

Flow cell 1 (10M reads) Flow cell 2 (20M reads)

A

B

C

D

Which gene is most highly expressed?

Page 48: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

RNA-Seq with Sleuth

Page 49: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

In an RNA-Seq experiment we need to find “true” differences between samples (while subtracting trivial differences)

Page 50: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Spot the difference

Photo Credit:https://www.rd.com/culture/spot-10-differences-picture/

Page 51: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Spot the difference

Mock – 1, Replicate 1 Mock – 1, Replicate 2 Zika – 1, Replicate 1 Zika – 1, Replicate 2

Page 52: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Sleuth linear modeling

Photo credithttps://www.nature.com/articles/nmeth.4324/figures/1

Page 53: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Sleuth linear modeling

Photo credithttps://twitter.com/phylogeo

Page 54: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Lab: Sleuth results

Page 55: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Comparing to the results of the L. Yi paper

Page 56: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Ontology enrichment: ShinyGO

http://bioinformatics.sdstate.edu/go/

Page 57: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Gene ontology

Photo credit:https://thepeakperformancecenter.com/educational-learning/learning/memory/stages-of-memory/organization-long-term-memory/

Page 58: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Gene ontology

Photo credit:http://geneontology.org/docs/ontology-documentation/

Page 59: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Tang paper ontologies

Photo credit:https://www.cell.com/cell-stem-cell/fulltext/S1934-5909(16)00106-5

Page 60: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

Goal recap

• Understand the rationale of an RNA-Seq experiment and its design

• Understand how we obtain DNA sequence and access its quality

• Use DNA Subway (FastQC/FastX) to QC sequence data

• Use DNA Subway (Kallisto) to (pseudo)align reads

• Use DNA Subway (Sleuth) to explore RNA-Seq results

Page 61: RNA-Seq with DNA Subway...Course Learning Goals • Understand the rationale of an RNA-Seq experiment and its design • Understand how we obtain DNA sequence and access its quality

dnalc.cshl.edu

DNALC Website and Social Media

dnalc.cshl.edu/dnalc-live