Upload
dominick-greene
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Where Will They Strike Next?microRNA targeting tactics
in the war on gene expression
Jeff Reid
Miller “Lab”
Baylor College of Medicine
Outline
• Introduction to miRNAs
• The “ask Bartel” model for targeting
• Our proposed model
• Discuss predictions made by our model– All positions on the miRNA are not equal– A given miRNA’s targets share function
• Have a quantitative model that does not suffer from the arbitrariness of ask Bartel
Plant microRNAs
• This talk is about plant miRNAs– Animal miRNAs different, more complicated– If you want to know more about them ask Tuan Tran!
• What is a microRNA (miRNA)?• ~21nt single-stranded non-coding RNAs• Processed from stem/loop precursors• Bind to mRNA in the cytoplasm• Regulate genes
– Often relevant to development
microRNA biogenesis (conventional wisdom)
1.miRNA gene is transcribed producing primary transcript
2.pri-miRNA processed by dicer…3. ..producing miRNA duplex4.duplex moves out of the nucleus5.helicase activity unzips duplex6.mature miRNA forms RNA-
induced silencing complex (RISC)7.RISC recognizes a target site8.Targeted mRNA is regulated
(mRNA cleavage or translational repression)
Figure from Bartel, D.P. (2004). Cell 116, 281-297.
Target “Acquisition”• How does the RISC identify target sites?• Based solely on mature miRNA sequence
– Consistent all with known examples– “just” string manipulation– With that in mind, consider a simple model…
• Targets have small “mismatch score” – M– Count non-WC pairs in miRNA/target duplex– Score is independent of position
A CG
CU
CC
CC CC
UUUU
U AA AAG
GG GGGG
U A G AC
target site
RISC
mRNA
M = 2
5’3’
5’3’
Complementarity Model*• Look for 21-mers (mRNA sequence) with M < 4
– Find targets…– mir172a1 [AP2]: At5g60120(1) At4g36920(2)
At2g28550(3) At5g67180(3)
At5g12900(3) – …turns out most targets of a given miRNA are in
genes which share a common function
• There are some ask Bartel elements to the model– M = 4 targets sharing function included case-by-case– Single bulges are sometimes allowed (mir162, mir163)
– Model specificity is problematic…*Rhoades, et. al. (2002) Cell 110, 513-520.
APETALA2 transcription factor
Selectivity and Specificity
• Selectivity (false negatives)– Bartel’s model finds “everything” for M < 5
• Putative targets from this model (most confirmed by experiment) define the target population
• Specificity (false positives)– Bartel’s model is problematic
•M < 5 includes many false positives•M < 4 and qualitative ask Bartel elements are
necessary for model specificity
• Our goal is to develop a quantitative model
Position Dependent Model
• Ask Bartel has been spectacularly successful• Build on existing model & make it quantitative• No a priori justification of position-independence
– assumed by the ask Bartel model
• Extend to a position-dependent mismatch model– Assign mismatch at position i weight i
• For ask Bartel modeli = 1
• Quantify target “strength” with binding probability– pt is the probability of finding the miRNA bound to
target site t in the mRNA population
• Now “mismatch score” is position-dependent
• Boltzmann factor gives binding probability
• Quantitative model built, but how to find i?
Boltzmann factors
L
itmit ii
tmE1
, )1(),,( m = miRNA* sequence
t = target site sequence = mismatch parameters
g
gmEgeβmZ ),,(),(
),(),,(
),,(
βmZ
eβtmp
tmE
t
t
A CG
CU
CC
CC CC
UUUU
U AA AAG
GG GGGG
U A G AC
RISC
mRNA1 2 3 4 5
5’3’
5’3’
Model Comparison• Follow DNA binding protein example*
– Consider a thought experiment….• Mix many copies of the genome and N copies of the protein
and count the number of examples of protein bound to site t
– ft = nt / N
• If the model works ft and pt must agree!
• Determine i by looking for this agreement
– Maximize the probability that the data (ft) could have
come from the model (pt)…
*Brown, C.T., and Callan, C.G. (2004). Proc. Natl. Acad. Sci. 101, 2404.
Model Testing
• Probability of data arising from our position dependent mismatch model
• Obtain best match of model to data by maximizing the log probability
• Yields set of parameters i which maximizes the
probability of getting the data from our model
g gg
gtmpm ffP ),,(),,(
g ggg βmZβtmEβm )],(ln[),,(),,(
ffL
Optimization Cartoon
1
2
3
4
5
Parameter Controls Inputs
miRNAs
data
Binding Probabilities
miRNA sequence
UAGCA
measured fraction bound
f1 f2 f3 f4 f5 ... f24
0
• Maximize L to get i
f24p24
Optimization Cartoon
1 2 3 4 5
Parameter Controls Inputs
miRNAs
data
Binding Probabilities
miRNA sequence
0
f1 f2 f3 f4 f5 ... f24UAGCA
f24p24
• Maximize L to get i
measured fraction bound
Optimization Cartoon
1
2
3
4
5
Parameter Controls Inputs
miRNAs
data
Binding Probabilities
miRNA sequence
0
f1 f2 f3 f4 f5 ... f24UAGCA
f24p24
• Maximize L to get i
measured fraction bound
Model Testing
• Probability of data arising from our position dependent mismatch model
• Obtain best match of model to data by maximizing the log probability
• Yields set of parameters i which maximizes the
probability of getting the data from our model
g gg
gtmpm ffP ),,(),,(
g ggg βmZβtmEβm )],(ln[),,(),,(
ffL
Review
• Application of this procedure to miRNAs
• Optimize to get best agreement between
– position-dependent mismatch model: pg
– Ask Bartel complementarity model: fg
• Equal binding probability for each training target• Minimal binding to everything else (background)
– A contribution we made to the method– necessary to avoid overfitting
Multi-miRNA Optimization
• Given the amount of data we have • This method would fail on DNA binding proteins
• All miRNAs share the same machinery for target recognition (all form the RISC)– DNA binding protein recognition depends on
the each specific protein
• Solution to our problem– Simultaneously optimize for several miRNAs
Results - Parameters
• Multi-miRNA optimization of nine Arabidopsis miRNAs– 157b, 159b, 160b, 164a, 165b, 167b, 168a, 171, 172a1– A set of functionally diverse (21-mer) miRNAs
3’ 5’(i)
i
Position 14• Mismatch at position 14
– Has no effect on a target’s binding probability!
• Surprising and exciting because…• …this position is known to be special
– mir162a target• 1g01040 DEAD/DEAH box helicase
– Has a bulge at position 14
• This analysis did not include mir162a!• A provocative result…
14 151 213’
3’
5’
5’ target
mir162a
Results - Targets
• Training targets should have low energy– Found by ask Bartel model– Reside in genes which share majority function
• Targets in the background have high energy– Background targets with low energy are interesting
• We are particularly interested all the majority function targets for a given miRNA– Especially those which are not training targets
• Look at distributions of target energies– For each value of M
mir165b -- HD-Zip
majority functionnot training
targets!
training targetsmajority function
N(E)
N(E)
mir159b -- MYBN(E)
N(E)
Conclusions• Refined the qualitative complementarity model
– A quantitative model which is much less arbitrary• Whatever we get, we get – not “ask Miller”
– Majority function targets group together at low energy– Bartel finds most targets, our model finds all targets
• Appropriate experiments could falsify our model– How important is position 14?– Look at some specific ask Bartel targets
• Advanced technology of optimization– Resolution of the overfitting problem– Simultaneous optimization
Encoding of Networks
• Networks– miRNA families
• A single target mRNA can be regulated by different miRNAs• And a single miRNA can regulate many different mRNAs
– Apparently an overlapping and probably redundant regulatory network
• Encoding– All this regulation encoded in mere text!– How is this encoded in the sequence?– Why is it encoded in this way?
Acknowledgements
• Miller Lab Posse– Jon Miller– Tuan Tran– Will Salerno– Gerald Lim
• Curtis Callan (Princeton)• Keck Center for Computational and
Structural Biology • BCM Biochemistry Department