Upload
aileen-black
View
235
Download
1
Embed Size (px)
Citation preview
18 Apr 2006 1
Intr
od
ucti
on
to B
ioin
form
ati
cs
Introduction to Bioinformatics
Lecture 14: Protein Folding
Centre for Integrative Bioinformatics VU (IBIVU)
18 Apr 2006 2
Intr
od
ucti
on
to B
ioin
form
ati
cs
Introduction to Protein Structure
• Great book covering basics of Protein Structure:
– Short Introduction to Molecular Structures
– “Introduction to Protein Structure”
• Chapters 1 to 5• Carl Branden &
John ToozeISBN: 0-8153-2305-0
18 Apr 2006 3
Intr
od
ucti
on
to B
ioin
form
ati
cs
Prelude: molecular structures
• John Dalton (1810)A new system of chemistry
• Elements, but no structures yet
• Mendeljev (1869)
18 Apr 2006 4
Intr
od
ucti
on
to B
ioin
form
ati
cs
Johannes van ’t Hoff
• Chimie dans l’Espace“Proposal for the development of three-dimensional chemical structural formulae” (1875)
• Tetraedrical carbon atom
18 Apr 2006 5
Intr
od
ucti
on
to B
ioin
form
ati
cs
Linus Pauling (1951)
• Atomic Coordinates and Structure Factors for Two Helical Configurations of
Polypeptide Chains
• Alpha-helix
18 Apr 2006 6
Intr
od
ucti
on
to B
ioin
form
ati
cs
James Watson & Francis Crick (1953)
• Molecular structure of nucleic acids
18 Apr 2006 7
Intr
od
ucti
on
to B
ioin
form
ati
cs
James Watson & Francis Crick (1953)
• Molecular structure of nucleic acids
18 Apr 2006 8
Intr
od
ucti
on
to B
ioin
form
ati
cs DNA/Protein structure-function
analysis and prediction
The building blocks:
•Chains of amino acids
•Three-dimensional Structures
•Four levels of protein architecture
•Amino acids: classes
•Disulphide bridges
•Histidine
•Proline
•Ramachandran plot: mainchain dihedral angles
•Rotamers: sidechain dihedral angles
18 Apr 2006 9
Intr
od
ucti
on
to B
ioin
form
ati
cs
The Building Blocks (proteins)
• Proteins consist of chains of amino acids• Bound together through the peptide bond• Special folding of the chain yields structure• Structure determines the function
18 Apr 2006 10
Intr
od
ucti
on
to B
ioin
form
ati
cs
Chains of aminoacids
18 Apr 2006 11
Intr
od
ucti
on
to B
ioin
form
ati
cs
Three-dimensional Structures
• Four hierarchical levels of protein architecture
18 Apr 2006 12
Intr
od
ucti
on
to B
ioin
form
ati
cs
Aminoacids: physicochemical classes
• Hydrophobic aminoacidsAlanine Ala A Valine Val V Phenylalanine Phe F Isoleucine Ile ILeucine Leu L Proline Pro PMethionine Met M
• Charged aminoacids
Aspartate (-) Asp D Glutamate (-) Glu E Lysine (+) Lys K Arginine (+) Arg R
• Polar aminoacids
Serine Ser S Threonine Thr TTyrosine Tyr Y Cysteine Cys CAsparagine Asn N Glutamine Gln Q Histidine His H Tryptophane TrpW
• Glycine (sidechain is only a hydrogen)Glycine Gly G
18 Apr 2006 13
Intr
od
ucti
on
to B
ioin
form
ati
cs
Disulphide bridges
• Two cysteines can form disulphide bridges• Anchoring of secondary structure elements
18 Apr 2006 16
Intr
od
ucti
on
to B
ioin
form
ati
cs
Ramachandran plot
• Only certain combinations of values of phi (and psi (angles are observed
phi
phipsi
psi
omega
18 Apr 2006 17
Intr
od
ucti
on
to B
ioin
form
ati
cs
Rotamers: highly populated combinations of side-chain dihedral angles
Rotamers •are amino acid sidechain dihedral angles, numbered 1, 2, 3,... going outward from C atom •different numbers of -angles depending on amino acid type•are usually defined as low energy side-chain conformations. •the use of a library of rotamers allows the modeling of a structure while trying the most likely side-chain conformations, saving time and producing a structure that is more likely to be correct.
18 Apr 2006 18
Intr
od
ucti
on
to B
ioin
form
ati
cs DNA/Protein structure-function
analysis and prediction
Motifs of protein structure
• Secundary structure elements
• Renderings of proteins
• Alpha helix
• Beta-strands & sheets
• Turns and motifs
• Domains formed by motifs
18 Apr 2006 19
Intr
od
ucti
on
to B
ioin
form
ati
cs
Motifs of protein structure
• Global structural characteristics:– Outside hydrophylic, inside hydrophobic (unless…)– Often globular form (unless…)
Artymiuk et al, Structure of Hen Egg White Lysozyme (1981)
18 Apr 2006 20
Intr
od
ucti
on
to B
ioin
form
ati
cs
Secundary structure elements
Alpha-helix Beta-strand
18 Apr 2006 21
Intr
od
ucti
on
to B
ioin
form
ati
cs
Renderings of proteins
• Irving Geis:
18 Apr 2006 22
Intr
od
ucti
on
to B
ioin
form
ati
cs
Renderings of proteins
• Jane Richardson:
18 Apr 2006 23
Intr
od
ucti
on
to B
ioin
form
ati
cs
Alpha helix
• Hydrogen bond: from N-H at position n, to C=O at position n-4 (‘n-n+4’)
18 Apr 2006 24
Intr
od
ucti
on
to B
ioin
form
ati
cs
Other helices
• Alternative helices are also possible
– 310-helix: hydrogen bond from N-H at position n, to C=O at position n-3
• Bigger chance of bad contacts– -helix: hydrogen bond from N-H at position n, to
C=O at position n-4– -helix: hydrogen bond from N-H at position n, to
C=O at position n-5• structure more open: no contacts• Hollow in the middle too small for e.g. water• At the edge of the Ramachandran plot
18 Apr 2006 25
Intr
od
ucti
on
to B
ioin
form
ati
cs
Helices
• Backbone hydrogen bridges form the structure– Often covers hydrophobic centre of protein
• Sidechains point outwards (‘Xmas tree’)– Possibly: one side hydrophobic, one side
hydrophylic (amphipathic helices)
18 Apr 2006 26
Intr
od
ucti
on
to B
ioin
form
ati
cs
Beta-strands: beta-sheets
• Beta-strands next to each other form hydrogen bridges
18 Apr 2006 27
Intr
od
ucti
on
to B
ioin
form
ati
cs
Parallel or Antiparallel sheets
Anti-parallel
Parallel
• Usually only parallel or anti-parallel
• Occasionally mixed• Sidechains alternating
(up-down)
18 Apr 2006 28
Intr
od
ucti
on
to B
ioin
form
ati
cs
Turns and motifs
• Between the secundary structure elements are loops• Very short loops between two -strands: turn
• Different secondary structure elements often appear together: motifs– Helix-turn-helix– Calcium binding motif– Hairpin– Greek key motif– -motif
18 Apr 2006 29
Intr
od
ucti
on
to B
ioin
form
ati
cs
Helix-turn-helix motif
• Helix-turn-helix important for DNA recognition by proteins
• EF-hand: calcium binding motif
18 Apr 2006 30
Intr
od
ucti
on
to B
ioin
form
ati
cs
Hairpin / Greek key motif
• Different possible hairpins : type I/II
• Greek key:anti-parallel beta-sheets
18 Apr 2006 31
Intr
od
ucti
on
to B
ioin
form
ati
cs
motif
• Most common way to obtain parallel -sheets
• Usually the motif is ‘right-handed’
18 Apr 2006 32
Intr
od
ucti
on
to B
ioin
form
ati
cs
Domains formed by motifs
• Within protein different domains can be identified– For example:
• ligand binding domain• DNA binding domain• Catalytic domain
• Domains are built from motifs of secondary structure elements
18 Apr 2006 33
Intr
od
ucti
on
to B
ioin
form
ati
cs
Alpha/beta barrels
• TIM barrel after triosephosphate isomerase• Usually 8 -strands, at least 200 aminoacids• Often hydrophobic interior
– alternating amino acids in the strands
18 Apr 2006 34
Intr
od
ucti
on
to B
ioin
form
ati
cs
Alpha/beta barrels
• Active site formed by (variable) loop regions at top of the barrel• Exception:
active site in the core of methylmalonyl-coenzyme A mutase
18 Apr 2006 35
Intr
od
ucti
on
to B
ioin
form
ati
cs
Summary
• Aminoacids form polypeptide chains• Chains fold into three-dimensional structure• Specific backbone angles are permitted or not:
Ramachandran plot• Secundary structure elements:
-helix, -sheet• Common structural motifs:
Helix-turn-helix, Calcium binding motif, Hairpin, Greek key motif, -motif
• Combination of elements and motifs: tertiary structure
• Many protein structures available: PDB
18 Apr 2006 36
Intr
od
ucti
on
to B
ioin
form
ati
cs
Sequence
Structure
Function
Inverse folding,
Threading
Ab initio
BLAST
Folding: impossible but for the smallest structures
Function prediction from structure – very difficult
Knowledge based
Sequence-Structure-FunctionWhat can we do with bioinformatics?
•Ab initio prediction (based on first principles) is still not generally succesful (red)
•Many Bioinformatics methods are therefore knowledge-based (green)
18 Apr 2006 37
Intr
od
ucti
on
to B
ioin
form
ati
cs
Active protein conformation
• Active conformation of protein is the native state• unfolded, denatured state
– high temperature– high pressure– high concentrations urea (8 M)
• Equilibrium between two forms
Denatured state Native state
18 Apr 2006 38
Intr
od
ucti
on
to B
ioin
form
ati
cs
Anfinsen’s Theorem (1950’s)
• Primary structure determines tertiary structure.In the mid 1950’s Anfinsen began to concentrate on the problem of the relationship between structure and function in enzymes. […] He proposed that the information determining the tertiary structure of a protein resides in the chemistry of its amino acid sequence. […] It was demonstrated that, after cleavage of disulfide bonds and disruption of tertiary structure, many proteins could spontaneously refold to their native forms. This work resulted in general acceptance of the ‘thermodynamic hypothesis’ (Nobel Prize Chemistry 1972)."
www.nobel.se/chemistry/laureates/1972/anfinsen-bio.html
• Anfinsen performed un-folding/re-folding experiments
18 Apr 2006 39
Intr
od
ucti
on
to B
ioin
form
ati
cs
Dimensions: Sequence Space• How many sequences of length n are possible?
N(seq) = 20 • 20 • 20 • … = 20n
e.g. for n = 100, N = 20100 10130, is nearly infinite– Only a subset of these will fold in a stable
conformation
• The probability p of finding twice the same sequence is p = 1/N, e.g. 1/10130
is nearly zero.
• Evolution: divergent or convergent– sequences are dissimilar,
in divergent and particularly in convergent evolution
18 Apr 2006 40
Intr
od
ucti
on
to B
ioin
form
ati
cs
Dimensions: Fold Space• How many folds exist?
– Sequences cluster into sequence families and fold families
– some have many members, some few or only one:
• Using Zipf’s law:
n(r) = a / rb
• For sequence families:
b 0.64 ntotal 60000
• For fold families:
b 0.8 ntotal 14000
r is the rank of family, n(r) is the number of proteins in the r-th family, a is a scaling constant, depending on the number of proteins in the dataset. Constant b does not depend on the size of the dataset.
18 Apr 2006 41
Intr
od
ucti
on
to B
ioin
form
ati
cs
Levinthal’s paradox (1969)
• Denatured protein re-folds in ~ 0.1 – 1000 seconds
• Protein with e.g. 100 amino acids each with 2 torsions ( en )
Each can assume 3 conformations (1 trans, 2 gauche)
3100x2 1095 possible conformations!
• Or:100 amino acids with 3 possibilities in Ramachandran plot (, , L): 3100 1047 conformations
• If the protein can visit one conformation in one ps (10-12 s) exhaustive search costs 1047 x 10-12 s = 1035 s 1027 years!(the lifetime of the universe 1010 years…)
18 Apr 2006 42
Intr
od
ucti
on
to B
ioin
form
ati
cs
Levinthal’s paradox
Protein folding problem:– Predict the 3D structure from sequence– Understand the folding process
18 Apr 2006 43
Intr
od
ucti
on
to B
ioin
form
ati
cs
From 1D to 3D…
18 Apr 2006 44
Intr
od
ucti
on
to B
ioin
form
ati
cs
18 Apr 2006 45
Intr
od
ucti
on
to B
ioin
form
ati
cs
What to fold?…fastest folders
1
10
100
1000
10000
100000
Nanose
con
ds,
CPU
-days
10
60
1
CPU
years
PPA alphahelix
betahairpinBBA5 villin
Pande et al. “Atomistic Protein Folding Simulations on the Submillisecond Time Scale Using Worldwide Distributed Computing” Biopolymers (2003) 68 91–109
18 Apr 2006 46
Intr
od
ucti
on
to B
ioin
form
ati
cs
Rates: predicted vs experiment
1
10
100
1000
10000
100000
1 10 100 1000 10000 100000experimental measurement
(nanoseconds)
Pre
dic
ted
fold
ing
tim
e
(n
an
osecon
ds)
PPA
alpha helix
betahairpin
villin
BBAW
Experiments:
villin: Raleigh, et al, SUNY, Stony Brook
BBAW:Gruebele, et al, UIUC
beta hairpin: Eaton, et al, NIH
alpha helix: Eaton, et al, NIH
PPA: Gruebele, et al, UIUC
Predictions:Pande, et al, Stanford
18 Apr 2006 47
Intr
od
ucti
on
to B
ioin
form
ati
cs
Molten globule
• First step: hydrophobic collapse• Molten globule: globular structure, not yet correct folded• Local minimum on the free energy surface
18 Apr 2006 48
Intr
od
ucti
on
to B
ioin
form
ati
cs
Folded state
• Native state = lowest point on the free energy landscape
• Many possible routes • Many possible local minima (misfolded structures)
18 Apr 2006 49
Intr
od
ucti
on
to B
ioin
form
ati
cs
Folding energy
• Each protein conformation has a certain energy and a certain flexibility (entropy)
• Corresponds to a point on a multidimensional free energy surface
Three coordinates per atom3N-6 dimensions possible G = H – TS
In very rough generalities:
H relates to bond formation/breaking
S relates to configurational freedom and water ordering
18 Apr 2006 50
Intr
od
ucti
on
to B
ioin
form
ati
cs
Hydrophobic Effect
Fundamental:The Hydrophobic Effect is a Solvent Effect
+Oil Water Oil
How is interfacial waterlayer ordered?
18 Apr 2006 51
Intr
od
ucti
on
to B
ioin
form
ati
cs
Hydrophobic Effect in Protein Folding
Unfolded
More Hydrocarbon-Water Interfacial Area,
More Water Ordered
Less Hydrocarbon-WaterInterfacial Area,
Less Water Ordered
Folded
S = +HOH HOH
+
18 Apr 2006 52
Intr
od
ucti
on
to B
ioin
form
ati
cs
Helper proteins
• Forming and breaking disulfide bridges– Disulfide bridge forming enzymes: Dsb– protein disulfide isomerase: PDI
• “Isomerization” of proline residues– Peptidyl prolyl isomerases
• Chaperones– Heat shock proteins– GroEL/GroES complex– Preventing or breaking
‘undesirable interactions’…
18 Apr 2006 53
Intr
od
ucti
on
to B
ioin
form
ati
cs
Disulfide bridges
• Equilibriums during the folding process
18 Apr 2006 54
Intr
od
ucti
on
to B
ioin
form
ati
cs
Proline: two conformations• Peptide bond nearly always trans (1000:1)
• For proline cis conformation also possible (trans:cis equilibrium = 4:1)
• For folding, all prolines need to be in trans conformation --Isomerization is bottleneck, cyclophilin catalyses
18 Apr 2006 55
Intr
od
ucti
on
to B
ioin
form
ati
cs
Chaperones
• During folding process hydrophobic parts outside?– Risk for aggregation of proteins
• Chaperones offer protection– Are mainly formed at high temperatures (when needed)– Heat-shock proteins: Hsp70, Hsp60 (GroEL), Hsp10 (GroES)
18 Apr 2006 56
Intr
od
ucti
on
to B
ioin
form
ati
cs
GroEL/GroES complex
• GroEL:– 2 x seven subunits in a ring– Each subunit has equatorial, intermediate and apical domain– ATP hydrolyse, ATP/ADP diffuse through intermediate domain
• GroES:– Also seven subunits– Closes cavity of GroEL
18 Apr 2006 57
Intr
od
ucti
on
to B
ioin
form
ati
cs
GroEL/GroES mechanism
• GroES binding changes both sides of GroEL– closed cavity– open cavity
• cycle– protein binds side 1– GroES covers, ATP binds– ATP ADP + Pi– ATP binds side 2– ATP -> ADP + Pi
• GroES opens• folded protein exits• ADP exits
– New protein binds
18 Apr 2006 58
Intr
od
ucti
on
to B
ioin
form
ati
cs
Alternative folding: prions
• Prion proteins are found in the brains
• Function unknown • Two forms
– normal alpha-structure– harmful beta-structure
• beta-structure can aggregate and form ‘plaques’– Blocks certain tissues and
functions in the brains
18 Apr 2006 59
Intr
od
ucti
on
to B
ioin
form
ati
cs
Protein flexibility
• Also a correctly folded protein is dynamic– Crystal structure
yields average position of the atoms
– ‘Breathing’ overall motion possible
18 Apr 2006 60
Intr
od
ucti
on
to B
ioin
form
ati
cs
B-factors
• The average motion of an atom around the average position
alpha helicesbeta-sheet
18 Apr 2006 61
Intr
od
ucti
on
to B
ioin
form
ati
cs
Protein Tertiary Structure Tied to Function
18 Apr 2006 62
Intr
od
ucti
on
to B
ioin
form
ati
cs
Conformational changes
• Often conformational changes play an important role for the function of the protein
• Estrogen receptor – With activator (agonist) bound: active– With inactivator (antagonist) bound: not active
active inactive
18 Apr 2006 63
Intr
od
ucti
on
to B
ioin
form
ati
cs
Main points
• Anfinsen: proteins fold reversibly!• Levinthal: too many conformations for fast folding?
– First hydrophobic collapse, then local rearrangement• Protein folding funnel
– Assistance with protein folding• Sulphur bridge formation• Proline isomerization• Chaperonins
• Intrinsic flexibility: Breating / Conformational change– Conformational changes for
• Activation / Deactivation