39
From Structure to Function

From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Embed Size (px)

Citation preview

Page 1: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

From Structure to Function

Page 2: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Given a protein structurecan we predict the function of a

protein when we do not have a known homolog in the database ?

Page 3: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

A different approach for predictingfunction from structure which does not rely on homology

• To characterize the known protein structures belonging to a specific family

• Find general structural features which areunique to the family

• Use these features to predict new members of the family

Page 4: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

EXAMPLE :Predicting new DNA-binding proteins

p53

Many DNA-binding proteins are involved in cancer

Page 5: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Leucine zippers -ribbon

Helix-Turn-Helix Zinc-Finger

Many different folds but all can bind DNA

Page 6: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

While DNA-binding proteins have diverse folds they all share a common property:All have positive charged surfaces

Complementing the negative charge of the DNA

Positive(Blue)

Negative(red)

Page 7: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

DNA-binding proteins are characterized by positive charged surfaces

But so do proteins that don’t bind nucleic acids

Positive(Blue)

Negative(red)

Page 8: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Strategy for predicting new DNA-binding proteins

1. Build a database of DNA-binding and non DNA-binding proteins

2. Extract the positive electrostatic patch in all proteins in Data Set.

3. Find features that could be used to discriminate the DNA-binding proteins from other proteins.

4. Use the features as a vector to train a machine learning algorithm to identify novel DNA-binding proteins

Page 9: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

9

Machine learning algorithmfor predicting protein function from structural

features

• SVM (Support Vector Machine) is trained on a set of known proteins that have a common function such as DNA binding (red dots), and in addition, a separate set of proteins that are known not to bind DNA (blue dots)

Page 10: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

10

• Using this training set of DNA and non-DNA binding protein, an SVM would learn to differentiate between the members and non-members of the family

• Having learned the features of the class (DNA binding proteins), the SVM could recognize a new protein as members or as non-members of the class based on the combination of its structural features.

?

Page 11: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

corr

ect

inco

rrec

t

corr

ect

inco

rrec

t0

20

40

60

80

100

DNA binding Non-‘DNA binding

Testing the algorithm for predicting DNA-binding proteins

TP, TN, FP, FNSensitivitySpecificity

Page 12: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

PredictingRNA Structure

Page 13: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

13

proteinRNADNA

According to the central dogma of molecular biology the main role of RNA is to transfer genetic information from DNA to protein

Page 14: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

RNA has many other biological functions

• Protein synthesis (ribosome)

• Control of mRNA stability (UTR)

• Control of splicing (snRNP)

• Control of translation (microRNA)

The function of the RNA molecule depends on its folded structure

Page 15: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Nobel prize 2009

Ribosome

Page 16: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Protein structures RNA structures

~Total 90,000 Total ~900

Page 17: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

RNA Structural levels

tRNA

Secondary Structure Tertiary Structure

Page 18: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

RNA Secondary Structure

U U

C G U A A UG C

5’ 3’

5’G A U C U U G A U C

3’

• RNA bases are G, C, A, U• The RNA molecule folds on itself. • The base pairing is as follows: G C A U G U hydrogen bond.

Stem

Loop

Page 19: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Predicting RNA secondary Structure

Most common approach:

Search for a RNA structure with a

Minimal Free Energy (MFE)

G A U C U U G A U C

U U

C G U A A UG U

G C U A G U

Low energy High energy

U

Page 20: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Free energy model

Free energy of a structure is the sum of all interactions energies

Each interaction energy can be calculated thermodynamicly

Free Energy(E) = E(CG)+E(CG)+…..

The aim: to find the structure with the minimal free energy (MFE)

Page 21: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Why is MFE secondary structure prediction hard?

• MFE structure can be found by calculating free energy of all possible structures

• BUT the number of potential structures grows exponentially with the number of bases

Solution :Dynamic programming (Zucker and Steigler)

Page 22: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Simplifying Assumptions for RNA Structure Prediction

• RNA folds into one minimum free-energy structure.

• The energy of a particular base can be calculated independently– Neighbors do not influence the energy.

Page 23: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Sequence dependent free-energy Nearest Neighbor Model

U U

C G G C A UG CA UCGAC 3’5’

U U

C G U A A UG CA UCGAC 3’5’

Free Energy of a base pair is influenced by

the previous base pair (not by the base pairs further down).

Page 24: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Sequence dependent free-energy values of the base pairs

(nearest neighbor model) U U

C G G C A UG CA UCGAC 3’5’

U U

C G U A A UG CA UCGAC 3’5’

Example values:GC GC GC GCAU GC CG UA -2.3 -2.9 -3.4 -2.1

These energies are estimated experimentally from small synthetic RNAs.

Page 25: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Improvements to the MFE approach

• Positive energy - added for destabilizing regions such as bulges, loops, etc.

• More than one structure can be predicted

Page 26: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Free energy computation

U UA A G C G C A G C U A A U C G A U A 3’A5’

-0.3

-0.3

-1.1 mismatch of hairpin-2.9 stacking

+3.3 1nt bulge -2.9 stacking

-1.8 stacking

5’ dangling

-0.9 stacking-1.8 stacking

-2.1 stacking

G= -4.6 KCAL/MOL

+5.9 4 nt loop

Page 27: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Improvements to the MFE approach

• Positive energy - added for destabilizing regions such as bulges, loops, etc.

• Looking for an ensemble of structures with

low energy and generating a consensus structure

WHY?

RNA is dynamic and doesn’t always fold to the lowest energy structure

Page 28: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

RNA fold prediction based on Multiple Alignment

Information from multiple sequence alignment (MSA) can help to predict the probability of positions i,j to be base-paired.

G C C U U C G G G CG A C U U C G G U CG G C U U C G G C C

Page 29: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Compensatory Substitutions

U U

C G U A A UG CA UCGAC 3’

G C

5’

Mutations that maintain the secondary structure

can help predict the fold

Page 30: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

RNA secondary structure can be revealed by

identification of compensatory mutations

G C C U U C G G G CG A C U U C G G U CG G C U U C G G C C

U CU GC GN N’G C

Page 31: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Insight from Multiple Alignment

Information from multiple sequence alignment (MSA) can help to predict theprobability of positions i,j to be base-paired.

•Conservation – no additional information•Consistent mutations (GC GU) – support stem•Inconsistent mutations – does not support stem.•Compensatory mutations – support stem.

Page 32: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

From RNA structure to Function

Rfam RNA Family databasehttp://www.sanger.ac.uk/Software/Rfam/

Many families of non coding RNAs which have unique functions are characterized by the combination of a conserved sequenceand structure

Page 33: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

MicroRNAsan example of an RNA family

miRNAgene

Target gene

maturemiRNA

Page 34: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

MicroRNA in Cancer

Page 35: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

The challenge for Bioinformatics:

- Identifying new microRNA genes- Identifying the targets of specific microRNA

Page 36: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

How to find microRNA genes?

Searching for sequences that fold to a hairpin ~70 nt -RNAfold-other efficient algorithms for identifying stem loops

Concentrating on intragenic regions and introns- Filtering coding regions

Filtering out non conserved candidates-Mature and pre-miRNA is usually evolutionary conserved

Page 37: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

How to find microRNA genes?

A. Structure prediction

B. Evolutionary Conservation

Page 38: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Predicting microRNA targets

MicroRNA targets are located in 3’ UTRs, and complementing mature microRNAs

•Why is it hard to find them ??– Base pairing is required only in the seed sequence

(7-8 nt) – Lots of known miRNAs have similar seed sequences

Very high probability to find by chance

3’ UTR of Target gene

mature miRNA

Page 39: From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Predicting microRNA target genes

• General methods

- Find motifs which complements the seed sequence (allow mismatches)– Look for conserved target sites– Consider the MFE of the RNA-RNA pairing ∆G

(miRNA+target)– Consider the delta MFE for RNA-RNA pairing

versus the folding of the target

∆G (miRNA+target )- ∆G (target)