29
+ miRNA Discovery and Prediction Algorithms George Michopoulos

+ miRNA Discovery and Prediction Algorithms George Michopoulos

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

  • Slide 1
  • + miRNA Discovery and Prediction Algorithms George Michopoulos
  • Slide 2
  • + microRNAs What are they? Why do we care about them? How do we discover them? Biological Methods Computational Methods What limitations do these methods have?
  • Slide 3
  • + What is microRNA?
  • Slide 4
  • + miRNA structure Small non-coding RNAs ~22-25 bases long Characterized by their hairpin precursors, composed of the mature, the loop, and the star miRNA
  • Slide 5
  • + miRNA biogenesis Transcribed in the nucleus Pri-miRNA hairpin gets cut by Drosha enzyme The pre-miRNA then either degrades into miRNA naturally, or gets cleaved by the Dicer enzyme Then the miRNA gets bound by an Argonoute protein into a RNA- induced silencing complex Then the complex binds target mRNA and cleaves it
  • Slide 6
  • + Why do we care? miRNAs regulate protein expression, including those involved in: Cancer inhibit proteins responsible for controlling proliferation Neural development links to schizophrenia Cardiac development linked to cardiomyopathies DNA methylation and histone modification can alter the expression of target genes
  • Slide 7
  • + Why do we care? The use of antagomirs, chemically engineered oligonucleotides, could be used as a therapy for such diseases to silence endogenous microRNA Non-coding RNAs account for a significant portion of the genome, so their homology can be used as tool to assess phylogeny
  • Slide 8
  • + Detection and Discovery Biological Methods: Can use RT-PCR and QPCR for individual miRNAs Can use microarrays to detect multiple miRNAs Computational Methods: Mining deep-sequencing data and using predictive algorithms to detect miRNA characteristics and compare potential sequences to homologs Bentwich et al. (2005) miRAlign: Wang et al. (2005) miRDeep: Friedlnder et al. (2008) miRDeep2: Friedlnder et al. (2011)
  • Slide 9
  • + RT-PCR Reverse transcription polymerase chain reaction, not real time PCR (qPCR) Desired RNA is transcribed and the resulting cDNA is amplified using qPCR Is useful for detecting very low copy numbers of RNA molecules; oldest method, non-specific for miRNA
  • Slide 10
  • + Northern Blotting Measure levels of RNA expression using probes with partial homology This picture shows a northern blot that has detected 4/5 of the shown microRNAs Lower sensitivity, but higher specificity than RT- PCR Fewer false positives
  • Slide 11
  • + Microarray Detection Microarrays first used to detect miRNAs in 2004 by different groups Probes can be developed and then chip can be ordered through companies (Barad et al.) Everything can be developed and put together using amine- binding slides and an array printer (Miska et al.) Incredibly more efficient for large scale discovery, but limited by the need for prior sequence data for probe development
  • Slide 12
  • + Took known miRNA sequences Created DNA chips with probes complementary to those sequences Hybridized miRNA samples onto chips Performed Clustering Analysis Use mirMASA to confirm findings Found that the microarray method has a higher sensitivity and specificity than previous miRNA identification methods Barad et al. (2004)
  • Slide 13
  • + Useful Programs: RNAFold RNAFold is an algorithm that is part of the Vienna Package Takes in RNA sequences and calculates their minimum free energy structure, outputting the following results:
  • Slide 14
  • + Useful Programs: ClustalW ClustalW is a multiple local alignment tool that is frequently used to compare homologous sequences across species, or to compare families of genes. Takes in two sequences, does a pairwise alignment, creates a phylogenetic tree, and then uses that to conduct multiple alignment using other sequences
  • Slide 15
  • + Bentwich et al. (2005)
  • Slide 16
  • + Scanning the entire human genome identified 11 million hairpins, including 86% of known microRNA precursors. After microarray sampling, the 359 expressed microRNAs were subjected to confirmation by sequencing Successfully cloned and sequenced 89 human microRNA genes that do not appear in the microRNA registry Using UCSC BlastZ alignment and ClustalW, found that fifty three of these are located in two large non-conserved clusters, including one on chromosome 19 that is only expressed in the placenta and was the largest microRNA cluster ever reported. This cluster comprises 43 new predicted microRNAs which all show similarity to a neighboring miRNA family specifically expressed in human embryonic stem cells The other cluster is on the X chromosome and its miRNAs are only expressed in the testis Homology analysis showed that both clusters are conserved only in chimpanzees and possibly rhesus monkeys
  • Slide 17
  • + miRAlign: Wang et al. (2005) A novel genome-wide computational approach to detect miRNAs in animals based on both sequence and structure alignment Uses RNAfold to test secondary structures, then CLUSTAL to perform pairwise alignment, unique algorithms to confirm the miRNAs position on the stem-loop, and finally RNAforester to conduct pairwise structure alignment
  • Slide 18
  • + miRAlign: Wang et al. (2005) miRAlign outperforms BLAST search in both sensitivity and selectivity, and furthermore, nearly all the known miRNAs found by BLAST can also be detected by miRAlign. The average number of false positives is 7.1 for BLAST and 0.9 for miRAlign Algorithm is dependent on pre-existing data to search against, only useful for finding miRNAs that are closely related to previously annotated ones.
  • Slide 19
  • + miRDeep: Friedlnder et al. (2008) Suite of PERL scripts Uses a probabilistic model of miRNA biogenesis to score compatibility of the position and frequency of sequenced RNA with the secondary structure of the miRNA precursor
  • Slide 20
  • + Algorithm for P(sequence is a precursor) score = log (P(pre | data) / P(bgr | data) The probability of the sequence being a precursor is given by Bayes theorem: P(pre | data) = P(data | pre) P(pre) / P(data) P(pre | data) = P(abs | pre) P(rel | pre) P(sig | pre) P(star | pre) P(nuc | pre) P(pre) / P(data) The same holds for the probability of the sequence being a background hairpin: P(bgr | data) = P(data | bgr) P(bgr) / P(data) P(bgr | data) = P(abs | bgr) P(rel | bgr) P(sig | bgr) P(star | bgr) P(nuc | bgr) P(bgr) / P(data)
  • Slide 21
  • + miRDeep: Friedlnder et al. (2008) Of the 555 known human mature miRNA sequences, 213 were present in the data set. Of these, 154 (72%) were successfully recovered by miRDeep. The total estimated number of false positives was 6 2 This pipeline is much more efficient at finding microRNA expression from deep-sequencing than the previous methods
  • Slide 22
  • + miRDeep2: Friedlnder et al. (2011) Analyzing data from seven animal species representing the major animal clades, miRDeep2 identified miRNAs with an accuracy of 98.699.9% and reported hundreds of novel miRNAs New package include many more options and graphical outputs that make the software more accessible
  • Slide 23
  • + miRDeep2: Friedlnder et al. (2011)
  • Slide 24
  • +
  • Slide 25
  • +
  • Slide 26
  • + Relative to miRDeep1: Performs excision by scanning the genome for stacks of reads, where a stack is one or more reads that map to the exact same 50 and 30 positions in the genome When identifying miRNAs in data from sea squirts, known to harbor large numbers of non-canonical miRNAs, the first version of miRDeep only reports 46 known and 31 novel miRNAs. In contrast, miRDeep2 reports 313 known and 127 novel ones Can detect anti-sense miRNAs (+/-) Supports single or multiple mismatches. Performs substantially better on the human data, reporting 186 known and 36 novel miRNAs (compared to 154 known and 10 novel in the initial publication) More accurate detection of lowly abundant miRNAs Faster; analyzed 30 million RNAs in less than 5 h and with 3 GB memory More intuitive interface for biologists
  • Slide 27
  • + Beyond miRDeep2 Remaining challenges in identifying and detecting expression levels of miRNA: miRBase, the primary database used as a source for miRNA annotations used today, is for from pristine Hard to tell whether detected novel miRNAs actually have a biological function, will take a lot of biological experimentation until we know that Algorithms still have room for improvement in terms of accessibility and efficiency
  • Slide 28
  • + Questions?
  • Slide 29
  • + References Barad, O., Meiri, E., Avniel, A., Aharonov, R., Barzilai, A., Bentwich, I., Einav, U., et al. (2004). MicroRNA expression detected by oligonucleotide microarrays : System establishment and expression profiling in human tissues. Genome Research, 2486-2494. doi:10.1101/gr.2845604.4 Bentwich, I., Avniel, A., Karov, Y., Aharonov, R., Gilad, S., Barad, O., Barzilai, A., et al. (2005). Identification of hundreds of conserved and nonconserved human microRNAs. Online, 37(7), 766-770. doi:10.1038/ng1590 Friedlnder, M. R., Chen, W., Adamidi, C., Maaskola, J., Einspanier, R., Knespel, S., & Rajewsky, N. (2008). Discovering microRNAs from deep sequencing data using miRDeep. Nature biotechnology, 26(4), 407-15. doi:10.1038/nbt1394 Friedlnder, M. R., Mackowiak, S. D., Li, N., Chen, W., & Rajewsky, N. (2011). miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic acids research, 1-16. doi:10.1093/nar/gkr688 Krger, J., & Rehmsmeier, M. (2006). RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic acids research, 34(Web Server issue), W451-4. doi:10.1093/nar/gkl243 Miska, E. a, Alvarez-Saavedra, E., Townsend, M., Yoshii, A., Sestan, N., Rakic, P., Constantine-Paton, M., et al. (2004). Microarray analysis of microRNA expression in the developing mammalian brain. Genome biology, 5(9), R68. doi:10.1186/gb-2004-5-9-r68 Wang, X., Zhang, J., Li, F., Gu, J., He, T., Zhang, X., & Li, Y. (2005). MicroRNA identification based on sequence and structure alignment. Bioinformatics (Oxford, England), 21(18), 3610-4. doi:10.1093/bioinformatics/bti562