1 Functional prediction in proteins (purifying and positive selection)

Preview:

Citation preview

11

Functional prediction Functional prediction in proteins in proteins

(purifying and positive (purifying and positive selection)selection)

22

1. Introduction: evolution 1. Introduction: evolution & sequence analysis& sequence analysis

33

Darwin – the theory of natural Darwin – the theory of natural selectionselection

Adaptive evolutionAdaptive evolution::

Favorable traits will become more Favorable traits will become more frequent in the populationfrequent in the population

44

Adaptive evolutionAdaptive evolution

When natural selection favors a single allele When natural selection favors a single allele

and therefore allele frequency continuously and therefore allele frequency continuously

shifts in one directionshifts in one direction

55

Kimura – the theory of neutral Kimura – the theory of neutral evolutionevolution

Neutral evolutionNeutral evolution::

Most molecular changes have no effect Most molecular changes have no effect on the phenotype (neutral)on the phenotype (neutral)

Selection operates to Selection operates to preservepreserve a trait a trait (no change)(no change)

66

Purifying SelectionPurifying Selection

Stabilizes a trait in a population:Stabilizes a trait in a population:Small babies Small babies more illness more illness

Large babies Large babies more difficult birth… more difficult birth…

Baby weight is stabilized round 3-4 KgBaby weight is stabilized round 3-4 Kg

77

Purifying selectionPurifying selection (conservation) -(conservation) - the the molecular levelmolecular level

Histone 3Histone 3

88

Synonymous vs. non-synonymous substitutions

Purifying selection: excess of synonymous substitutions relative to non-synonymous substitutions

Synonymous substitution: GUUGUC

Non-synonymous substitution: GUUGCU

99

Synonymous vs. non-synonymous substitutions

Histone 3Histone 3

Non-synNon-syn.. SynSyn..

1010

Conservation as a means of Conservation as a means of predicting functionpredicting function

Infer the rate of evolution at each siteInfer the rate of evolution at each site

1111

Conservation as a means of Conservation as a means of predicting functionpredicting function

Low rate of evolution Low rate of evolution constraints on the site to constraints on the site to prevent disruption of function/structure: prevent disruption of function/structure: active sites, protein-protein interactions, protein active sites, protein-protein interactions, protein core etc.core etc. 11223344556677

HumanHumanDDMMAAAAHHAAMM

ChimpChimpDDEEAAAAGGGGCC

CowCowDDQQAAAAWWAAPP

FishFishDDLLAAAACCAALL

S. S. cerevisiaecerevisiae

DDDDGGAAFFAAAA

S. pombeS. pombeDDDDGGAALLGGEE

1212

Which site is more conserved?Which site is more conserved?

11223344556677

HumanHumanDDMMAAAAHHAAMM

ChimpChimpDDEEAAAAGGGGCC

CowCowDDQQAAAAWWAAPP

FishFishDDLLAAAACCAALL

S. S. cerevisiaecerevisiae

DDDDGGAAFFAAAA

S. pombeS. pombeDDDDGGAALLGGEE

1313

Use phylogenetic informationUse phylogenetic information 11223344556677

HumanHumanDDMMAAAAHHAAMM

ChimpChimpDDEEAAAAGGGGCC

CowCowDDQQAAAAWWAAPP

FishFishDDLLAAAACCAALL

S. S. cerevisiaecerevisiae

DDDDGGAAFFAAAA

S. pombeS. pombeDDDDGGAALLGGEEA

G

A

A

A

G

A

A

A

A

G

G

1414

ConSurf/ConSeq web servers:ConSurf/ConSeq web servers: Prediction of conserved residues by Prediction of conserved residues by

estimating evolutionary rates at each siteestimating evolutionary rates at each site

1515

Working processWorking processInput a protein Input a protein with a known 3D structurewith a known 3D structure

((PDB ID or file provided by the userPDB ID or file provided by the user))

Find homologous protein sequences )psi-blast(

Perform multiple sequence alignment )removing doubles(

Construct an evolutionary tree

Project the results on the 3D structure

Calculate the conservation score for each site

1616

ConSurf example: ConSurf example: potassium channel potassium channel

An integral membrane protein with sequence An integral membrane protein with sequence similarity to all known K+ channels, particularly similarity to all known K+ channels, particularly in the pore region. in the pore region.

PDB ID: 1bl8 chain A PDB ID: 1bl8 chain A

1717

ConSurf resultsConSurf results

1818

http://http://conseq.bioinfo.tau.ac.ilconseq.bioinfo.tau.ac.il//

ConSeq performs the same analysis as ConSeq performs the same analysis as ConSurf but presents the results on the ConSurf but presents the results on the sequence.sequence.

Predicts buried/exposed relation Predicts buried/exposed relation exposed & conserved exposed & conserved functionally important functionally important

sitessites buried & conserved buried & conserved structurally important sites structurally important sites

1919

2. Positive selection & drug 2. Positive selection & drug resistanceresistance

2020

Darwin – the theory of natural Darwin – the theory of natural selectionselection

Adaptive evolutionAdaptive evolution::

Favorable traits will become more Favorable traits will become more frequent in the populationfrequent in the population

2121

Adaptive evolution Adaptive evolution at the molecular levelat the molecular level

2222

Adaptive evolution Adaptive evolution at the molecular levelat the molecular level

Look for Look for changes changes

which confer which confer an advantagean advantage

2323

Naïve detectionNaïve detection

Observe a multiple sequence alignment:Observe a multiple sequence alignment:variable regions = adaptive evolution??variable regions = adaptive evolution??

2424

Naïve detectionNaïve detection The problem – how do we know which The problem – how do we know which

sites are not under any sites are not under any selectionselection pressure pressure (“non-important” sites) and which are (“non-important” sites) and which are underunder adaptive evolution adaptive evolution??

2525

Solution – we look at the DNASolution – we look at the DNA

synonymoussynonymous

non-non-synonymoussynonymous

2626

Solution – we look at the DNASolution – we look at the DNA

Purifying selectionSyn > Non-syn

Adaptive evolution = Positive selectionNon-syn > Syn

NeutralselectionSyn = Non-syn

2727

Also known as… Ka/Ks Also known as… Ka/Ks (or dn/ds, or (or dn/ds, or ωω) ) ratioratio

Purifying selection: Ka < Ks (Ka/Ks <1)Purifying selection: Ka < Ks (Ka/Ks <1) Neutral selection: Ka = Ks (Ka/Ks = 1)Neutral selection: Ka = Ks (Ka/Ks = 1) Positive selection: Ka > Ks (Ka/Ks >1)Positive selection: Ka > Ks (Ka/Ks >1)

Non-synonymous

substitution rate

Synonymous substitution

rate

2828

Examples for positive selectionExamples for positive selection

Proteins involved in the Proteins involved in the immune systemimmune system Proteins involved in Proteins involved in host-pathogen host-pathogen

interactioninteraction (‘arms-race’) (‘arms-race’) Proteins following Proteins following gene duplicationgene duplication Proteins involved in Proteins involved in reproductionreproduction systems systems

2929

Accumulation of substitutions (syn. or non-syn.) Accumulation of substitutions (syn. or non-syn.)

depends on the evolutionary time that elapsed depends on the evolutionary time that elapsed

since the divergence of the analyzed species. since the divergence of the analyzed species.

When distant species are analyzed saturation of syn.When distant species are analyzed saturation of syn.

substitutions is often encounteredsubstitutions is often encountered

Synonymous vs. non-synonymous substitutions

3030

Selecton – a server for the detection Selecton – a server for the detection of purifying and positive selectionof purifying and positive selection

http://selecton.bioinfo.tau.ac.il

Stern et al., Nucleic Acids Res 35, W506 (2007).

3131

Detecting drug resistance using Detecting drug resistance using SelectonSelecton

3232

HIV: molecular evolution paradigmHIV: molecular evolution paradigm

Rapidly evolving Rapidly evolving virus:virus:

1.1.High mutation High mutation rate (low rate (low fidelity of fidelity of reverse reverse transcriptase)transcriptase)

2.2.High High replication replication raterate

3333

Drug resistanceDrug resistance

No No drugdrug

DrugDrug

Adaptive evolution Adaptive evolution (positive selection)(positive selection)

3434

HIV ProteaseHIV Protease

Protease is an Protease is an essential essential enzymeenzyme for viral for viral

replicationreplication

Drugs against Drugs against Protease are Protease are

always part of always part of the “cocktail”the “cocktail”

3535

Ritonavir InhibitorRitonavir Inhibitor

Ritonavir (RTV) is a specific protease Ritonavir (RTV) is a specific protease inhibitor (drug)inhibitor (drug)

CC3737HH4848NN66OO55SS22

3636

Used Selecton to analyse HIV-1 protease Used Selecton to analyse HIV-1 protease gene sequences from patients that were gene sequences from patients that were treated with RTV treated with RTV onlyonly

3737

3838

Example: HIV ProteaseExample: HIV Protease

Primary mutationsPrimary mutations Secondary Secondary

mutationsmutations

novel novel predictions predictions (experimental (experimental validation)validation)

3939

Rate shifts and Rate shifts and HIV sub-typesHIV sub-types

4040

Rate shiftsRate shifts

V Chimp

V Rhesus

A Squirrel

K Rat

M Mouse

V Human

4141

Rate shiftsRate shifts

V

VA

K

M

V

Low evolutionary rateLow evolutionary rate

High evolutionary rateHigh evolutionary rate

4242

Rate shiftsRate shifts

Specificity determinants:Specificity determinants: Different phylogenetic groupsDifferent phylogenetic groups

V Chimp

V Rhesus

A Squirrel

A Rat

A Mouse

V HumanGain of Gain of function?function?

4343

Rate shiftsRate shifts

Specificity determinants:Specificity determinants: Following gene duplicationFollowing gene duplication

V S. paradoxus

V S. mikatae

A S. cervisiae

A S. paradoxus

A S. mikatae

V S. cervisiae

Tropomyosin 1Tropomyosin 1

Tropomyosin 2Tropomyosin 2

4444

Rate shifts in HIV subtypesRate shifts in HIV subtypes

4545

HIV subtypesHIV subtypes

4646

Which sites are responsible for the Which sites are responsible for the differences between the subtypesdifferences between the subtypes??

Detection of rate-shifts in all 9 subtypesDetection of rate-shifts in all 9 subtypes

4747

Significant rate shift in all HIV genesSignificant rate shift in all HIV genes

# rate-shift sites# rate-shift sitesproportionproportion

EnvEnv84840.10.1

GagGag20200.040.04

NefNef21210.170.17

PolPol33330.030.03

RevRev29290.250.25

TatTat13130.150.15

VifVif13130.070.07

VprVpr440.050.05

VpuVpu29290.350.35

4848

Gag Position12Gag Position12

Wild-type (Wild-type (EE)) Site which contributes to Site which contributes to

Protease Inhibitor (Amprenavir) Protease Inhibitor (Amprenavir) drug resistance (drug resistance (KK))

E

E

E

K

Q

R

K

K

4949

C

C

A

G

F

D

J

KK

EE

QQ

NN

RR

5050

SummarySummary

Sequence analysis can provide valuable Sequence analysis can provide valuable information about protein functioninformation about protein function

The basic signal: conservation:The basic signal: conservation:

http://http://consurf.tau.ac.ilconsurf.tau.ac.il Positive “Darwinian” selection: Positive “Darwinian” selection:

http://selecton.bioinfo.tau.ac.ilhttp://selecton.bioinfo.tau.ac.il Rate-shifts (specificity determinants)Rate-shifts (specificity determinants)

Recommended