25
Lectures 13: High throughput sequencing: Beyond the genome Spring 2017 March 28, 2017

Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Lectures  13:  High  throughput  sequencing:  Beyond  the  genome  

Spring  2017  March  28,  2017  

Page 2: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

h@p://www.fejes.ca/2009/06/science-­‐cartoons-­‐5-­‐rna-­‐seq.html  

Page 3: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Omics  •  Transcriptome  -­‐  the  set  of  all  mRNAs  present  in  a  cell  •  Proteome  –  proteins  •  Metabolome/physiome  -­‐  metabolites  •  Microbiome  –  the  collecSon  of  microbes  present  in  an  

organism  or  other  locaSon  •  Interactome    “In  physics…  the  -­‐on  suffix  has  tended  to  signify  an  elementary  parScle:  the  photon,  electron,  proton,  meson,  etc.,  whereas  -­‐ome  in  biology  has  the  opposite  intellectual  funcSon,  of  direcSng  a@enSon  to  a  holisSc  abstracSon,  an  eventual  goal…”    From:  ‘Ome  Sweet  ‘Omics.  The  ScienSst  15(7),  2001  

Page 4: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Omics  •  Biologists  have  high-­‐throughput  methods  for  probing  

each  -­‐ome:  •  Transcriptome  –  RNA-­‐Seq  •  Proteome  –  mass  spectrometry,  protein  arrays  •  Microbiome  –  next  generaSon  sequencing  •  Interactome  –  yeast-­‐two-­‐hybrid  •  Regulome  –  ChIP-­‐Seq    Lots  of  data  for  bioinformaScs  people  to  analyze!  

Page 5: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

RNA-­‐seq:    profiling  the  transcriptome  

•  Technique:    sequence  the  total  RNA  produced  by  the  cell  

Page 6: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Read  mapping  

Page 7: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Pile-­‐ups  

From:  The  ENCODE  Project  ConsorSum  (2011)  A  User's  Guide  to  the  Encyclopedia  of  DNA  Elements  (ENCODE).  PLoS  Biol  9(4):  e1001046.    

Page 8: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Pile-­‐ups  

gene  model  

read  depth    

Most  reads  fall  into  coding  exons  or  UTRs  

Page 9: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

RNA-­‐seq:    profiling  the  transcriptome  

•  Technique:    sequence  the  total  RNA  produced  by  the  cell  

•  What  is  this  good  for?  

Page 10: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

RNA-­‐seq:    profiling  the  transcriptome  

•  Genome  annotaSon  (transcript  assembly)  •  Detect  alternaSve  splicing  •  Obtain  gene/transcript  expression  levels  and  detecSon  of  differenSal  expression  

•  Allele-­‐specific  expression  •  Small-­‐RNA  transcriptome  (different  protocol  than  regular  RNA-­‐seq)    

Page 11: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

All  the  uses  of  RNA-­‐seq  

h@p://www.rna-­‐seqblog.com/news/informaSon/rna-­‐seq-­‐blog-­‐poll-­‐results/  

Page 12: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

DifferenSal  expression  

h@p://www.fejes.ca/labels/figures.html  

Page 13: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

RNA-­‐seq  protocol  

Page 14: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Raw  and  Aligned  Reads  •  Raw  data  is  a  (large)  set  of  sequences  •  Typical  file  format  is  FASTQ  

@HWI-EAS255_4_FC2010Y_1_43_110_790

TTAATCTACAGAATAGATAGCTAGCATATATTT

+

hhhhhhhhhhhhhhhdhhhhhhhhhhhdRehdh

•  Alignment  to  genome  is  done  by  efficient  indexing  •  Aligned  reads  in  SAM  format    

@HWI-… 163 chr19 9900 10000 16M2I25M  

Base  quality  codes  

Read  idenSfier  Bases  called  

Start  and  end    posiSons  

Codes  for  match:  16  matches,  2  extra,…  

Where  this    read  matched  

Read    idenSfier  

Page 15: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Cataloging  the  transcriptome  

•  Transcriptomics  involves  studying  expression  at  – SpaSal  resoluSon:  Sssues,  individuals,  locaSon  – Temporal  resoluSon:  circadian,  seasonal,  lifeSme  

Page 16: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Inter-­‐Genic  Reads  

•  Many  reads  reflect  unannotated  genes:    opportunity  to  discover  new  genes  

Page 17: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

RPKM  –  A  Simple  NormalizaSon  

•  Different  numbers  of  counts  per  sample  (sequencing  depth)  

•  Divide  counts  in  a  region  of  interest  (a  genomic  region  or  a  gene  or  an  exon)  by  all  counts  (reads  per  million  reads  -­‐RPM)  

•  Genes  have  different  lengths:  divide  also  by  length  of  gene    

•  Obtain  RPKM  (reads  per  kilobase  of  exon  per  million  reads)  – Some  use  FPKM  (fragments/kb/Mr)  

Page 18: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

ChIP-­‐seq  

h@p://www.fejes.ca/labels/Chip-­‐Seq.html  

Page 19: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017
Page 20: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Comments  on  ChIP-­‐seq  

•  Genome-­‐wide  mapping  of  transcripSon  factor  binding  sites  

•  ComputaSonal  problems:  – Peak  calling  – SSll  need  moSf  finders,  but  makes  the  problem  easier  

Page 21: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Variants  

•  Apply  the  methodology  to  RNA:    map  RNA-­‐binding  sites  in  mRNA  that  interact  with  specific  RNA-­‐binding  proteins  

•  CLIP-­‐Seq  (cross-­‐linking  immunoprecipitaSon  sequencing)    •  RIP-­‐Seq  (RNA  immunoprecipitaSon  sequencing)  

Page 22: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Other  sequencing-­‐based  techniques  

•  Methyl-­‐seq,  BS-­‐seq:  methylaSon  •  Chromosome  conformaSon  capture  (3C-­‐4C-­‐5C-­‐HiC):  spaSal  organizaSon  of  chromosomes  

h@p://en.wikipedia.org/wiki/Chromosome_conformaSon_capture  

Page 23: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Other  sequencing-­‐based  techniques  

•  Methyl-­‐seq,  BS-­‐seq:  methylaSon  •  Chromosome  conformaSon  capture  (3C-­‐4C-­‐5C):  spaSal  organizaSon  of  chromosomes  

•  seqFold:  RNA  secondary  structure  •  DNAase-­‐seq  •  And  many  more!  

h@p://en.wikipedia.org/wiki/Chromosome_conformaSon_capture  

Page 24: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Read  mapping  

All  the  sequencing-­‐based  techniques  require  read  mapping  as  a  first  step.    ExisSng  alignment  tools  are  not  fast  enough  à  need  new  algorithms!  

Page 25: Lectures(13:(High(throughput sequencing:(Beyond(the(genome(cs425/spring17/slides/Lecture_13... · 2017. 3. 30. · Lectures(13:(High(throughput sequencing:(Beyond(the(genome(Spring2017(March(28,(2017

Read  mapping  

•  How  is  the  problem  of  read  mapping  different  than  sequence  alignment  as  we  have  considered  it  unSl  now?