61
1 functional/structural functional/structural sites in a protein using sites in a protein using conservation and hyper- conservation and hyper- variation variation (ConSeq, ConSurf, (ConSeq, ConSurf, Selecton) Selecton)

1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

11

Prediction of Prediction of functional/structural sites in functional/structural sites in a protein using conservation a protein using conservation

and hyper-variation and hyper-variation (ConSeq, ConSurf, Selecton)(ConSeq, ConSurf, Selecton)

Page 2: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

22

Empirical findings:Empirical findings:variation among genesvariation among genes

““ImportantImportant”” proteins evolveproteins evolve

slowerslowerthan “unimportantunimportant” onesones.

Page 3: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

33

Histone H4 proteinHistone H4 protein

Page 4: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

44

Empirical findings:Empirical findings:variation among genesvariation among genes

Functional Functional regionsregions evolveevolve

slowerslowerthanthan nonfunctional nonfunctional regions.regions.

Page 5: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

55

Conservation = functional/structural Conservation = functional/structural importanceimportance

Page 6: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

66

Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHLBos MALWTRLRPLLALLALWPPPPARAFVNQHL **** : * *.*: *:..* :. *:****

Xenopus CGSHLVEALYLVCGDRGFFYYPKIKRDIEQBos CGSHLVEALYLVCGERGFFYTPKARREVEG **************:******** :*::*

Xenopus AQVNGPQDNELDG-MQFQPQEYQKMKRGIVBos PQVG---ALELAGGPGAGGLEGPPQKRGIV .**. ** * * *****

Xenopus EQCCHSTCSLFQLENYCNBos EQCCASVCSLYQLENYCN **** *.***:*******

Alignment preproinsulin

Page 7: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

77

Page 8: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

88

Page 9: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

99

Conserved sites: Important for the function or structureImportant for the function or structure Not allowed to mutateNot allowed to mutate “Slow evolving” sites Low rate of evolution

Variable sites: Less important (usually) Change more easily “Fast evolving” sites High rate of evolution

Conservation based inferenceConservation based inference

Page 10: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

1010

Detecting conservation: Detecting conservation: Evolutionary rates

d T

dr

2

• Rate (~speed) = distance / time• Distance = number of substitutions per site • Time = 2*#years (doubled because the sequences evolved independently

Page 11: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

1111

Mean Rate of Nucleotide Substitution in Mammalian Genomes

Evolution is a very Evolution is a very slowslow process at the molecular process at the molecular level level (“Nothing (“Nothing happens…”)happens…”)

~10-9

Substitutions/site/year

Page 12: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

1212

Rate computationRate computation11223344556677

HumanHumanDDMMAAAAHHAAMM

ChimpChimpDDEEAAAAGGGGCC

CowCowDDQQAAAAWWAAPP

FishFishDDLLAAAACCAALL

S. cerevisiaeS. cerevisiaeDDDDGGAAFFAAAA

S. pombeS. pombeDDDDGGAALLGGEE

Page 13: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

1313

http://http://conseqconseq.tau.ac.il.tau.ac.ilSite-specific rate computation methodSite-specific rate computation method

Page 14: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

1414

Using the ConSeq serverUsing the ConSeq server

Page 15: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

1515

ConSeq results:ConSeq results:

Page 16: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

1616

Crash course in protein structureCrash course in protein structure

Page 17: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

1717

Why protein structure?Why protein structure?

Each protein has a particular 3D structure that determines

its function

Protein structure is better conserved than protein sequence

and more closely related to function

Analyzing a protein structure is

more informative than analyzing its

sequence for function inference

Page 18: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

1818

PDB: Protein Data BankPDB: Protein Data Bankhttp://www.rcsb.orghttp://www.rcsb.org

Holds 3D models of biological macromolecules (protein, RNA, DNA, small molecules)

All data are available to the public

X-Ray crystals (84%) NMR models (16%)

Submitted by biologists and biochemists from around the world.

Page 19: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

1919

PDB modelPDB model

Defines the 3D coordinates (x,y,z) of each of the atoms Defines the 3D coordinates (x,y,z) of each of the atoms in one in one or moreor more molecules (i.e., complex) molecules (i.e., complex)

There are models of proteins, protein complexes, There are models of proteins, protein complexes, proteins and DNA, protein segments, etc …proteins and DNA, protein segments, etc …

The models also include the positions of ligand The models also include the positions of ligand molecules, solvent molecules, metal ions, etc…molecules, solvent molecules, metal ions, etc…

PDB code: integer + 3 integers/characters (e.g., 1a14) PDB code: integer + 3 integers/characters (e.g., 1a14)

Page 20: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

2020

The PDB file – text formatThe PDB file – text format

Page 21: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

2121

The PDB file – textThe PDB file – text formatformat

ATOM:

Usually protein or DNA

HETATM:

Usually Ligand, ion, water

chain

Residue identity

Residue number

Atom number

Atom identity

The coordinates for each residue in the structure Temperature

factorX Y Z

Page 22: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

2222

Viewing structuresViewing structuresWireframe Spacefill

Backbone

Page 23: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

2323

Protein core: structurally constrained - usually conserved

Active site: functionally constrained - usually conserved

Surface loops: tolerant to mutations - usually variable

Hydrophobic core

Surface loops

Conservation in the structureConservation in the structure

Active site

Page 24: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

2424

http://http://consurfconsurf.tau.ac.il.tau.ac.ilSame algorithm as ConSeq, but here the resultsSame algorithm as ConSeq, but here the results are projected onto the 3D structure of the proteinare projected onto the 3D structure of the protein

Page 25: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

2525

Using the ConSurf serverUsing the ConSurf server

Page 26: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

2626

ConSurf example: ConSurf example: potassium channel potassium channel

An integral membrane protein with sequence An integral membrane protein with sequence similarity to all known K+ channels, particularly similarity to all known K+ channels, particularly in the pore region. in the pore region.

PDB code: 1bl8, chain A PDB code: 1bl8, chain A

Page 27: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

2727

ConSurf ConSurf resultsresults::

Page 28: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

2828

Alignment of homologs found by psi-blast:Alignment of homologs found by psi-blast:

ConSurf example: ConSurf example: potassium channel potassium channel

Page 29: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

2929

ConSurf ConSurf resultsresults::

Page 30: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

3030

ConSurf example: ConSurf example: potassium channel potassium channel

Neighbor-Joining reconstructed phylogenetic tree:Neighbor-Joining reconstructed phylogenetic tree:

Page 31: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

3131

ConSurf ConSurf resultsresults::

Page 32: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

3232

Conservation scoresConservation scores::

The scores are standardized: the average score for all The scores are standardized: the average score for all residues is zero, and the standard deviation is one residues is zero, and the standard deviation is one

The lowest score represents the most conserved site The lowest score represents the most conserved site in the protein in the protein negative values: slowly evolving (= low evolutionary rate), negative values: slowly evolving (= low evolutionary rate),

conserved sitesconserved sites The highest score represents the most variable site in The highest score represents the most variable site in

the proteinthe protein positive values: rapidly evolving (= fast evolutionary rate), positive values: rapidly evolving (= fast evolutionary rate),

variable sitesvariable sites

Page 33: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

3333

ConSurf results: amino-acid ConSurf results: amino-acid conservation scoresconservation scores

Page 34: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

3434

ConSurf result with First Glance in ConSurf result with First Glance in Jmol:Jmol:

Page 35: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

3535

ConSeqConSeq//ConSurfConSurf user intervention user intervention(advanced options)(advanced options)

1.1. Method of calculating the amino acid conservation scores: Method of calculating the amino acid conservation scores: BayesianBayesian/Max Likelihood/Max Likelihood

2.2. Enter your own MSA fileEnter your own MSA file3.3. Multiply Align Sequences using: Multiply Align Sequences using: MUSCLEMUSCLE/CLUSTALW/CLUSTALW4.4. Collect the Homologues from: Collect the Homologues from: SWISS-PROTSWISS-PROT/UniProt/UniProt5.5. Max. Number of Homologues (default = 50)Max. Number of Homologues (default = 50)6.6. No. of PSI-BLAST Iterations (default = 1)No. of PSI-BLAST Iterations (default = 1)7.7. PSI-BLAST E-value Cutoff (default = 0.001PSI-BLAST E-value Cutoff (default = 0.001))8.8. Model of substitution for proteins: Model of substitution for proteins:

JTTJTT/Dayhoff/mtREV/cpREV/WAG/Dayhoff/mtREV/cpREV/WAG9.9. Enter your own PDB fileEnter your own PDB file10.10. Enter your own TREE fileEnter your own TREE file

Page 36: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

3636

Codon-level selectionCodon-level selection

ConSeq/ConSurf:ConSeq/ConSurf: Compute the evolutionary rate of amino-acid Compute the evolutionary rate of amino-acid

sites → the data are amino acids.sites → the data are amino acids. But, codons encode amino acids…But, codons encode amino acids… 61 codons vs. 20 amino acids !61 codons vs. 20 amino acids ! Aren’t we loosing information ???Aren’t we loosing information ???

Page 37: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

3737

Darwin – the theory of Darwin – the theory of natural selectionnatural selection

Adaptive evolutionAdaptive evolution::

Favorable traits will become more Favorable traits will become more frequent in the populationfrequent in the population

Page 38: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

3838

M. Kimura – the neutral theory M. Kimura – the neutral theory of molecular evolutionof molecular evolution

Most of the DNA variation betweenMost of the DNA variation betweenspecies is neutral with regards to the species is neutral with regards to the phenotypephenotype

Selection operates to Selection operates to preservepreserve a trait a trait

Page 39: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

3939

Synonymous (silent) and non-synonymous (non-silent) substitutions

Silent Non-silent…

Page 40: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

4040

Synonymous vs. nonsynonymous substitutions

UUU → UUC (Phe → Phe ): synonymous

UUU → CUU (Phe → Leu): non-synonymous

synonymous substitutions = silent substitutions

non-synonymous substitutions = non-silent or amino-acid altering substitutions

Page 41: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

4141

For For mostmost proteins, it is observed that the proteins, it is observed that the rate of rate of synonymoussynonymous substitutions is much substitutions is much

HigherHigherthan the than the non-synonymousnon-synonymous rate rate

This is called purifying selectionpurifying selection (= conservation (= conservation this is what ConSeq/Surf are computingthis is what ConSeq/Surf are computing))

Synonymous vs. non-synonymous substitutions

Page 42: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

4242

Synonymous vs. non-synonymous substitutions

Structural proteins

Page 43: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

4343

Saturation of synonymous substitutions

Histone H4 between human and wheat: saturation of synonymous substitutions

Page 44: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

4444

There are rare cases where the non-synonymous rate is much larger than the synonymous rate.

This is called Positive selectionPositive selection

Synonymous vs. nonsynonymous substitutions

Page 45: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

4545

Examples:Examples: Proteins of the immune systemProteins of the immune system Pathogen proteins evading the host immune Pathogen proteins evading the host immune

systemsystem Pathogen proteins that are drug targetsPathogen proteins that are drug targets Proteins that are products of gene duplicationProteins that are products of gene duplication Proteins involved in the reproduction systemProteins involved in the reproduction system

Positive Selection

The hypothesis:The hypothesis:

Promotes the fitness of the organism Promotes the fitness of the organism

Page 46: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

4646

Computing synonymous and non-synonymous rates

• Codon-based MSA: translate DNA to amino acids, align, backtrack to the DNA but keep alignment

• Phylogenetic tree: 5 replacements in 10 positions between 5 replacements in 10 positions between human and chimp is a lot, but between human and human and chimp is a lot, but between human and cucumber is nothing cucumber is nothing

• Different replacement probabilities between two amino acids:

LysLysArg Arg ≠ ≠ LysLysCysCys

Positive evolution occurs at only a few sitesPositive evolution occurs at only a few sites! !

Page 47: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

4747

Inferring positive selectionInferring positive selection

Divide the rate of non-silent Divide the rate of non-silent substitutions (substitutions (KKaa))

by the rate of silent substitutions (by the rate of silent substitutions (KKss))

s

ak

k

Page 48: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

4848

Inferring positive selectionInferring positive selection

Basic assumptions:Basic assumptions:

Selection score Selection score ((Ka/KsKa/Ks) > 1) > 1

↓↓

positive selectionpositive selection

Selection score Selection score ((Ka/KsKa/Ks) < 1) < 1

↓↓

purifying selectionpurifying selection

Page 49: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

4949

Not so fastNot so fast!!! !!!

Our computational model assumes Our computational model assumes there is positive selection in the datathere is positive selection in the data

There is a good chance our model There is a good chance our model will find a few positively selected will find a few positively selected sites whatever the case sites whatever the case

Is this really indicative of positive Is this really indicative of positive selection or plain randomness?selection or plain randomness?

So, maybe there’s no positive selection after all So, maybe there’s no positive selection after all

Page 50: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

5050

Statistics helps us to compare Statistics helps us to compare between hypothesesbetween hypotheses

HH00: There’s no positive selection: There’s no positive selection

HH11: There is positive selection: There is positive selection

2~)))0(|(

))1(|(ln(2

HMDataL

HMDataL

HH00: compute the probability: compute the probability (likelihood) (likelihood) of the data of the data

using a model that using a model that does does not not account for positive account for positive selectionselection

HH11: compute the probability: compute the probability (likelihood) (likelihood) of the data of the data

using a model that using a model that does account for positive selectiondoes account for positive selection Perform a Perform a likelihood ratio test likelihood ratio test (LRT)(LRT)

Page 51: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

5151

http://selecton.tau.ac.il

Page 52: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

5252

Using the selecton serverUsing the selecton server

Page 53: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

5353

Input = a coding sequence at the codon level

The user must provide the sequences – no psi-blast optionThe user must provide the sequences – no psi-blast option The sequences’ lengths must divide by 3 (ORF) and must The sequences’ lengths must divide by 3 (ORF) and must not not

include any stop-codonsinclude any stop-codons An alignment should be a An alignment should be a codon alignmentcodon alignment RevTransRevTrans

Page 54: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

5454

Similar to ConSurf

optional

Nuclear/mitochondria different species

Default run:M8(H1) and the M8a(H0)

Page 55: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

5555

Selecton Example: HIV ProteaseSelecton Example: HIV Protease

The Protease is The Protease is an essential an essential

enzymeenzyme for viral for viral

infectivityinfectivity

PDB ID: 1hxwPDB ID: 1hxw

Page 56: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

5656

Selecton ResultsSelecton Results::

Page 57: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

5757

Selecton ResultsSelecton Results::

Page 58: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

5858

Selecton resultsSelecton results::

Page 59: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

5959

Selection scores (Selection scores (Ka/KsKa/Ks):):

The scores are normalized The scores are normalized Ka/Ks Ka/Ks > 1: positive selected site > 1: positive selected site Ka/KsKa/Ks <1: purified selected site <1: purified selected site

Page 60: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

6060

Coloring schemeColoring scheme::

Used for visualization is based on the Used for visualization is based on the continuous continuous Ka/KsKa/Ks scores. scores.

The color grades (1-7):The color grades (1-7): 1 for positive selected sites (blue)1 for positive selected sites (blue) 7 for purified selected sites (bordeaux)7 for purified selected sites (bordeaux)

Color coding scheme of Selecton

Page 61: 1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

6161