+ miRNA Discovery and Prediction Algorithms George
Michopoulos
Slide 2
+ microRNAs What are they? Why do we care about them? How do we
discover them? Biological Methods Computational Methods What
limitations do these methods have?
Slide 3
+ What is microRNA?
Slide 4
+ miRNA structure Small non-coding RNAs ~22-25 bases long
Characterized by their hairpin precursors, composed of the mature,
the loop, and the star miRNA
Slide 5
+ miRNA biogenesis Transcribed in the nucleus Pri-miRNA hairpin
gets cut by Drosha enzyme The pre-miRNA then either degrades into
miRNA naturally, or gets cleaved by the Dicer enzyme Then the miRNA
gets bound by an Argonoute protein into a RNA- induced silencing
complex Then the complex binds target mRNA and cleaves it
Slide 6
+ Why do we care? miRNAs regulate protein expression, including
those involved in: Cancer inhibit proteins responsible for
controlling proliferation Neural development links to schizophrenia
Cardiac development linked to cardiomyopathies DNA methylation and
histone modification can alter the expression of target genes
Slide 7
+ Why do we care? The use of antagomirs, chemically engineered
oligonucleotides, could be used as a therapy for such diseases to
silence endogenous microRNA Non-coding RNAs account for a
significant portion of the genome, so their homology can be used as
tool to assess phylogeny
Slide 8
+ Detection and Discovery Biological Methods: Can use RT-PCR
and QPCR for individual miRNAs Can use microarrays to detect
multiple miRNAs Computational Methods: Mining deep-sequencing data
and using predictive algorithms to detect miRNA characteristics and
compare potential sequences to homologs Bentwich et al. (2005)
miRAlign: Wang et al. (2005) miRDeep: Friedlnder et al. (2008)
miRDeep2: Friedlnder et al. (2011)
Slide 9
+ RT-PCR Reverse transcription polymerase chain reaction, not
real time PCR (qPCR) Desired RNA is transcribed and the resulting
cDNA is amplified using qPCR Is useful for detecting very low copy
numbers of RNA molecules; oldest method, non-specific for
miRNA
Slide 10
+ Northern Blotting Measure levels of RNA expression using
probes with partial homology This picture shows a northern blot
that has detected 4/5 of the shown microRNAs Lower sensitivity, but
higher specificity than RT- PCR Fewer false positives
Slide 11
+ Microarray Detection Microarrays first used to detect miRNAs
in 2004 by different groups Probes can be developed and then chip
can be ordered through companies (Barad et al.) Everything can be
developed and put together using amine- binding slides and an array
printer (Miska et al.) Incredibly more efficient for large scale
discovery, but limited by the need for prior sequence data for
probe development
Slide 12
+ Took known miRNA sequences Created DNA chips with probes
complementary to those sequences Hybridized miRNA samples onto
chips Performed Clustering Analysis Use mirMASA to confirm findings
Found that the microarray method has a higher sensitivity and
specificity than previous miRNA identification methods Barad et al.
(2004)
Slide 13
+ Useful Programs: RNAFold RNAFold is an algorithm that is part
of the Vienna Package Takes in RNA sequences and calculates their
minimum free energy structure, outputting the following
results:
Slide 14
+ Useful Programs: ClustalW ClustalW is a multiple local
alignment tool that is frequently used to compare homologous
sequences across species, or to compare families of genes. Takes in
two sequences, does a pairwise alignment, creates a phylogenetic
tree, and then uses that to conduct multiple alignment using other
sequences
Slide 15
+ Bentwich et al. (2005)
Slide 16
+ Scanning the entire human genome identified 11 million
hairpins, including 86% of known microRNA precursors. After
microarray sampling, the 359 expressed microRNAs were subjected to
confirmation by sequencing Successfully cloned and sequenced 89
human microRNA genes that do not appear in the microRNA registry
Using UCSC BlastZ alignment and ClustalW, found that fifty three of
these are located in two large non-conserved clusters, including
one on chromosome 19 that is only expressed in the placenta and was
the largest microRNA cluster ever reported. This cluster comprises
43 new predicted microRNAs which all show similarity to a
neighboring miRNA family specifically expressed in human embryonic
stem cells The other cluster is on the X chromosome and its miRNAs
are only expressed in the testis Homology analysis showed that both
clusters are conserved only in chimpanzees and possibly rhesus
monkeys
Slide 17
+ miRAlign: Wang et al. (2005) A novel genome-wide
computational approach to detect miRNAs in animals based on both
sequence and structure alignment Uses RNAfold to test secondary
structures, then CLUSTAL to perform pairwise alignment, unique
algorithms to confirm the miRNAs position on the stem-loop, and
finally RNAforester to conduct pairwise structure alignment
Slide 18
+ miRAlign: Wang et al. (2005) miRAlign outperforms BLAST
search in both sensitivity and selectivity, and furthermore, nearly
all the known miRNAs found by BLAST can also be detected by
miRAlign. The average number of false positives is 7.1 for BLAST
and 0.9 for miRAlign Algorithm is dependent on pre-existing data to
search against, only useful for finding miRNAs that are closely
related to previously annotated ones.
Slide 19
+ miRDeep: Friedlnder et al. (2008) Suite of PERL scripts Uses
a probabilistic model of miRNA biogenesis to score compatibility of
the position and frequency of sequenced RNA with the secondary
structure of the miRNA precursor
Slide 20
+ Algorithm for P(sequence is a precursor) score = log (P(pre |
data) / P(bgr | data) The probability of the sequence being a
precursor is given by Bayes theorem: P(pre | data) = P(data | pre)
P(pre) / P(data) P(pre | data) = P(abs | pre) P(rel | pre) P(sig |
pre) P(star | pre) P(nuc | pre) P(pre) / P(data) The same holds for
the probability of the sequence being a background hairpin: P(bgr |
data) = P(data | bgr) P(bgr) / P(data) P(bgr | data) = P(abs | bgr)
P(rel | bgr) P(sig | bgr) P(star | bgr) P(nuc | bgr) P(bgr) /
P(data)
Slide 21
+ miRDeep: Friedlnder et al. (2008) Of the 555 known human
mature miRNA sequences, 213 were present in the data set. Of these,
154 (72%) were successfully recovered by miRDeep. The total
estimated number of false positives was 6 2 This pipeline is much
more efficient at finding microRNA expression from deep-sequencing
than the previous methods
Slide 22
+ miRDeep2: Friedlnder et al. (2011) Analyzing data from seven
animal species representing the major animal clades, miRDeep2
identified miRNAs with an accuracy of 98.699.9% and reported
hundreds of novel miRNAs New package include many more options and
graphical outputs that make the software more accessible
Slide 23
+ miRDeep2: Friedlnder et al. (2011)
Slide 24
+
Slide 25
+
Slide 26
+ Relative to miRDeep1: Performs excision by scanning the
genome for stacks of reads, where a stack is one or more reads that
map to the exact same 50 and 30 positions in the genome When
identifying miRNAs in data from sea squirts, known to harbor large
numbers of non-canonical miRNAs, the first version of miRDeep only
reports 46 known and 31 novel miRNAs. In contrast, miRDeep2 reports
313 known and 127 novel ones Can detect anti-sense miRNAs (+/-)
Supports single or multiple mismatches. Performs substantially
better on the human data, reporting 186 known and 36 novel miRNAs
(compared to 154 known and 10 novel in the initial publication)
More accurate detection of lowly abundant miRNAs Faster; analyzed
30 million RNAs in less than 5 h and with 3 GB memory More
intuitive interface for biologists
Slide 27
+ Beyond miRDeep2 Remaining challenges in identifying and
detecting expression levels of miRNA: miRBase, the primary database
used as a source for miRNA annotations used today, is for from
pristine Hard to tell whether detected novel miRNAs actually have a
biological function, will take a lot of biological experimentation
until we know that Algorithms still have room for improvement in
terms of accessibility and efficiency
Slide 28
+ Questions?
Slide 29
+ References Barad, O., Meiri, E., Avniel, A., Aharonov, R.,
Barzilai, A., Bentwich, I., Einav, U., et al. (2004). MicroRNA
expression detected by oligonucleotide microarrays : System
establishment and expression profiling in human tissues. Genome
Research, 2486-2494. doi:10.1101/gr.2845604.4 Bentwich, I., Avniel,
A., Karov, Y., Aharonov, R., Gilad, S., Barad, O., Barzilai, A., et
al. (2005). Identification of hundreds of conserved and
nonconserved human microRNAs. Online, 37(7), 766-770.
doi:10.1038/ng1590 Friedlnder, M. R., Chen, W., Adamidi, C.,
Maaskola, J., Einspanier, R., Knespel, S., & Rajewsky, N.
(2008). Discovering microRNAs from deep sequencing data using
miRDeep. Nature biotechnology, 26(4), 407-15. doi:10.1038/nbt1394
Friedlnder, M. R., Mackowiak, S. D., Li, N., Chen, W., &
Rajewsky, N. (2011). miRDeep2 accurately identifies known and
hundreds of novel microRNA genes in seven animal clades. Nucleic
acids research, 1-16. doi:10.1093/nar/gkr688 Krger, J., &
Rehmsmeier, M. (2006). RNAhybrid: microRNA target prediction easy,
fast and flexible. Nucleic acids research, 34(Web Server issue),
W451-4. doi:10.1093/nar/gkl243 Miska, E. a, Alvarez-Saavedra, E.,
Townsend, M., Yoshii, A., Sestan, N., Rakic, P., Constantine-Paton,
M., et al. (2004). Microarray analysis of microRNA expression in
the developing mammalian brain. Genome biology, 5(9), R68.
doi:10.1186/gb-2004-5-9-r68 Wang, X., Zhang, J., Li, F., Gu, J.,
He, T., Zhang, X., & Li, Y. (2005). MicroRNA identification
based on sequence and structure alignment. Bioinformatics (Oxford,
England), 21(18), 3610-4. doi:10.1093/bioinformatics/bti562