Wagner chapter 4

Preview:

DESCRIPTION

Book club on "Origins of Evolutionary Innovations" by A. Wagner http://bioinfoblog.it/

Citation preview

Book club

Andreas Wagner,The Origins of Evolutionary Innovations

Chapter 4

Book club presented by G. M. Dall'Olio, Pompeu Fabra, IBE-CEXS

Reminder:Genotype network

A genotype network is a set of genotypes that have the same phenotype, and are connected by single pairwise differences

AAAAA AAAAC AAAAG AAAAT AAATT

AAACA AAACC AAACG AAACT AAATC

AACCA AACCC AACCG AACCT …..

ACCCA ACCCC ACCCG ACCCT …..

CCCCA CCCCC CCCCG CCCCT …..

….. ….. ….. ….. …..

Yellow = same phenotype = a genotype network Note: genotype network == neutral network

Genotype Networksbetter representation!

The Genotype Space can be represented as a Hamming Graph

https://bitbucket.org/dalloliogm/genotype_space

Chapter 4:Novel Molecules

This chapter describes the relationship between protein/RNA sequence and tertiary structure

Most RNA/Proteins have the same fold but different sequences

Novel Molecules,definitions (1)

Genotype:  def 1: the aminoacid sequence of a protein 

(or the list of hydrophobic) def 2: the nucleotidic sequence of a RNA 

A genotype space of sequences

A genotype space of sequences (simplified)

O = any Hydrophobic aminoacid Y = any Hydrophilic aminoacid

Novel Moleculesdefinitions (2)

Phenotype:  The fold of a protein sequence The secondary structure of a RNA molecule

Protein Structures

It is also possible to predict the fold of a protein

But it is difficult, so here we focus on “lattice models”

In a lattice model, we only use hydrophobic or hydrophilic aminoacids

A Genotype network

In this example, all orange sequences have the same fold:

More sequences than folds

Li et al, 1996: study on lattice protein models: There are many more protein sequences than folds Some phenotypes are formed by more sequences 

than others Sequences that produce the same fold can be very 

different

Rost, 1997: study on 272 proteins with similar folds. They shared 8.5% of aa seq

There are many more protein sequences than

protein folds Globins are a very common protein domain Most globins have different sequence, but the same 

fold Among some hemoglobins, only 12.4% of aa 

residues are identical

Do globins have a common origin?

Bailly, X., Chabasse, C., Hourdez, S., Dewilde, S., Martial, S., Moens, L. and Zal, F. (2007), Globin gene family evolution and functional diversification in annelids. FEBS Journal, 274: 2641–2652. doi: 10.1111/j.1742-4658.2007.05799.xGoodman M, Pedwaydon J, Czelusniak J, Suzuki T, Gotoh T, Moens L, Shishikura F, Walz D, Vinogradov S. An evolutionary tree for invertebrate globin sequences. J Mol Evol. 1988;27(3):236-49. PubMed PMID: 3138426.

Some folds are more common than others

Some folds can be obtained by an higher number of sequences than others

Number of proteins Sequences by structure (Ferrada, Wagner 2010): 

Ferrada, E. & Wagner, A., 2010. Evolutionary innovations and the organization of protein functions in genotype space. PloS one, 5(11), p.e14172. Available at: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2994758&tool=pmcentrez&rendertype=abstract

The 10 most structurally promiscuous functions

Promiscuity of a function: when the function can be obtained by different structures/sequences

Ferrada, E. & Wagner, A., 2010. Evolutionary innovations and the organization of protein functions in genotype space. PloS one, 5(11), p.e14172. Available at: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2994758&tool=pmcentrez&rendertype=abstract

Genotype networks of protein sequences

Sequences that have the same fold tend to be connected in a genotype network (from Li et al, 1996)

More the case of figure 1 (above) than figure 2 (below)

RNA structures

RNA secondary structures can be predicted in silico

http://rna.ucsc.edu/rnacenter/ribosome_images.html

RNA structure videogame

There is even a videogame on predicting RNA structure:

http://eterna.cmu.edu/

So, predicting RNA structures is (relatively) easy

Innovations in RNA folds

All the observations made for protein sequences are also valid for RNA, in a bigger scale:

On average, 400 million RNA seqs per fold Very long RNA sequences tend to similar folds

There are many more RNA sequences than RNA folds

Size rank of genotype set by frequency

Wagner, A., 2008. Robustness and evolvability: a paradox resolved. Proceedings. Biological sciences / The Royal Society, 275(1630), pp.91-100. Available at: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2562401&tool=pmcentrez&rendertype=abstract

Frequent RNA structures

def. frequent RNA structure: a RNA structure that can be obtained by > 5000 sequences

Only 10% of RNA structures are frequent 93% of RNA sequences belong to frequent RNA 

structures

RNA sequences can withstand a lot of changes, without modifying the fold

Maximal genotype distance in a RNA gen. network:

A. Wagner, The Origins of Evolutionary Innovations. Figure 4.6

RNA sequences can withstand a lot of changes, without modifying the fold

Different sequence, same fold:

http://eterna.cmu.edu/

Neighbors of points in the genotype network

Most neighbors of sequences in the space have the same fold

A. Wagner, The Origins of Evolutionary Innovations. Figure 4.7

Neighbors of points in the genotype network

Most neighbors of sequences in the space have the same fold

This means that the genotype network of a RNA fold is usually dense

RNA genotype network is more likely to fig 1 than fig 2:

Fig 1 Fig 2

Neighbors of genotypes in a genotype network

Two sequences on a genotype network have, by definition, the same fold.

But what about their neighbors?

A. Wagner, The Origins of Evolutionary Innovations. Figure 2.6

Phenotype of neighbors of genotype network

Neighbor of genotypes can have very different phenotypes

Novel RNA phenotypes

Schultes and Bartel: designed a new rybozime from two existing ones

Existing enzymes had <25% sequence similarity and no common structure

Few mutations needed to obtain the hybrid Schultes, E. a & Bartel, D.P., 2000. One sequence, two ribozymes: implications

for the emergence of new ribozyme folds. Science (New York, N.Y.), 289(5478), pp.448-52. Available at: http://www.ncbi.nlm.nih.gov/pubmed/10903205

Take Home messages

There are many more sequences than protein/RNA folds

Some folds correspond to more sequences than others

Sequences that produce the same fold can be very different

New folds can be reached by changing few bases

A Genotype network

All blue sequences have the same fold

Recommended