Upload
others
View
24
Download
0
Embed Size (px)
Citation preview
CS612 - Algorithms in Bioinformatics
Protein Structure
February 5, 2019
Introduction to Protein Structure
A protein is a linear chain of organic molecular building blockscalled amino acids.
Nurit Haspel CS612 - Algorithms in Bioinformatics
Introduction to Protein Structure
Amine (NH+3 ), carboxyl (COO−),
C-α and the hydrogen attached toit are called backbone.
All amino acids have the samebackbone.
R (residue) can be anything...
It is called the side chain, and isdifferent for different amino acids.
In nature there are 20 amino acids.Amine(NH+
3 )
6
R
?
H
?
C-α
6
Carboxyl(COO−)
6
Nurit Haspel CS612 - Algorithms in Bioinformatics
D vs. L Amino Acids
C COO−N+H3
R
H
Mirror
CCOO− +NH3
R
H
The dashed lines are pointing away from you, and the boldlines are pointing towards you.
Amino acids in nature are L (the left side of the image).
D Amino acids (right side) can be synthesized, but they nearlynever exist in nature.
Nurit Haspel CS612 - Algorithms in Bioinformatics
Amino Acid Names
Name 3-letter 1-letter Side chain Name 3-letter 1-letter Side chain
Glycine GLY G Glutamine GLN Q
Alanine ALA A Asparagine ASN N
Phenylalanine PHE F Lysine LYS K
Leucine LEU L Arginine ARG R
Isoleucine ILE I Serine SER S
Valine VAL V Threonine THR T
Proline PRO P Tyrosine TYR Y
Methionine MET M Glutamic acid GLU E
Tryptophan TRP W Aspartic acid ASP D
Histidine HIS H Cysteine CYS C
Nurit Haspel CS612 - Algorithms in Bioinformatics
Hydrophobic Amino Acids
Name 3-letter 1-letter Side chain Name 3-letter 1-letter Side chain
Glycine GLY G Glutamine GLN Q
Alanine ALA A Asparagine ASN N
Phenylalanine PHE F Lysine LYS K
Leucine LEU L Arginine ARG R
Isoleucine ILE I Serine SER S
Valine VAL V Threonine THR T
Proline PRO P Tyrosine TYR Y
Methionine MET M Glutamic acid GLU E
Tryptophan TRP W Aspartic acid ASP D
Histidine HIS H Cysteine CYS C
Nurit Haspel CS612 - Algorithms in Bioinformatics
Polar Amino Acids
Name 3-letter 1-letter Side chain Name 3-letter 1-letter Side chain
Glycine GLY G Glutamine GLN Q
Alanine ALA A Asparagine ASN N
Phenylalanine PHE F Lysine LYS K
Leucine LEU L Arginine ARG R
Isoleucine ILE I Serine SER S
Valine VAL V Threonine THR T
Proline PRO P Tyrosine TYR Y
Methionine MET M Glutamic acid GLU E
Tryptophan TRP W Aspartic acid ASP D
Histidine HIS H Cysteine CYS C
Nurit Haspel CS612 - Algorithms in Bioinformatics
Charged Amino Acids
Name 3-letter 1-letter Side chain Name 3-letter 1-letter Side chain
Glycine GLY G Glutamine GLN Q
Alanine ALA A Asparagine ASN N
Phenylalanine PHE F Lysine LYS K
Leucine LEU L Arginine ARG R
Isoleucine ILE I Serine SER S
Valine VAL V Threonine THR T
Proline PRO P Tyrosine TYR Y
Methionine MET M Glutamic acid GLU E
Tryptophan TRP W Aspartic acid ASP D
Histidine HIS H Cysteine CYS C
Positive charge
Negative charge
Nurit Haspel CS612 - Algorithms in Bioinformatics
Aromatic Amino Acids
Name 3-letter 1-letter Side chain Name 3-letter 1-letter Side chain
Glycine GLY G Glutamine GLN Q
Alanine ALA A Asparagine ASN N
Phenylalanine PHE F Lysine LYS K
Leucine LEU L Arginine ARG R
Isoleucine ILE I Serine SER S
Valine VAL V Threonine THR T
Proline PRO P Tyrosine TYR Y
Methionine MET M Glutamic acid GLU E
Tryptophan TRP W Aspartic acid ASP D
Histidine HIS H Cysteine CYS C
Nurit Haspel CS612 - Algorithms in Bioinformatics
Formation of Proteins
During the translation of a gene into a protein, the protein isformed by the sequential joining of amino acids end-to-end toform a long chain-like molecule (polymer).
A polymer of amino acids is often referred to as a polypeptide.
The genome is capable of coding for 20 different amino acidswhose chemical properties. depend on the composition oftheir side chains (”R”).
Thus, to a first approximation, a protein is a sequence ofthese amino acids.
This sequence is called the primary structure of the protein.
Nurit Haspel CS612 - Algorithms in Bioinformatics
Formation of Proteins
?
Peptide bond
Amino acid 1 Amino acid 2
Nurit Haspel CS612 - Algorithms in Bioinformatics
Polymerization of Amino Acids
N-terminus C-terminus
Peptide bond
�����
Peptide – amino acids polymerized in chain
Nurit Haspel CS612 - Algorithms in Bioinformatics
Interactions Between Atoms
���*
Covalent bond
Planar angle������
�
Dihedral angle
?
Nurit Haspel CS612 - Algorithms in Bioinformatics
Dihedral Angles
Π1
Π2
θ
A
B C
D
A
B C
D
A D
+ clockwise– counterclockwise
Nurit Haspel CS612 - Algorithms in Bioinformatics
Dihedral Angles
(a) 0 degrees (b) 180 degrees
(c) 90 degrees (d) -90 degrees
Nurit Haspel CS612 - Algorithms in Bioinformatics
Backbone Dihedral Angles
Cα
N C
N ′R
phi(φ) psi(ψ)
omega(ω)
φ is defined by Ci−1,N,C-α,C – rotation about the N - C-α axis.
It is undefined for the first amino acid.
ψ is defined by Ni ,C-α,C,Ni+1 – rotation about the C-α - C axis.
It is undefined for the last amino acid.
ω is always 180 degrees.
Nurit Haspel CS612 - Algorithms in Bioinformatics
Protein Secondary Structures
In order to work properly, aprotein must fold to form aspecific three-dimensionalshape called nativeconformation/structure.
Secondary structure refers tofolding in a small part of theprotein that forms acharacteristic shape.
The most commonsecondary structure elementsare α-helices and β-sheets.
Left: α helix. Right: β sheet
Nurit Haspel CS612 - Algorithms in Bioinformatics
Secondary Structure Elements
Repeating values of φand ψ along thepeptide chain resultin regular structures.
These structures arestabilized byinteractions alongatoms on the chain.
Left: α helix. Right: β sheet
Nurit Haspel CS612 - Algorithms in Bioinformatics
α Helices
α helices are characterized by repeating values of φ around -57 andψ around -47.
Ramachandran plot: A scatterplot showing the φ and ψ values ofamino acids. Areas in green are ”allowed” (energetically favorable).
Nurit Haspel CS612 - Algorithms in Bioinformatics
β Strands
β strands are characterized by repeating values of φ around -110 –-140 and ψ around 110 – 135.
These relatively extended strands interact to form β sheets.
Nurit Haspel CS612 - Algorithms in Bioinformatics
Parallel and Anti-Parallel β Sheets
Nurit Haspel CS612 - Algorithms in Bioinformatics
Coils and Loops
The sections of the peptidechain that link the α-helicesand β-sheets are referred toas turns and loops
Other secondarysubstructure classificationsexist, but are rarely seen inpractice
Sub-units that do not fitinto any other classificationare known as random coils
Nurit Haspel CS612 - Algorithms in Bioinformatics
Protein 3-D Structure
The 3-dimensional fold of aprotein is called a tertiarystructure.
Many proteins consist ofmore than one polypeptidefolded together.
The spatial relationshipbetween these separatepolypeptide chains is calledthe quaternary structure.
Nurit Haspel CS612 - Algorithms in Bioinformatics
Protein Folds – Mainly α
Nurit Haspel CS612 - Algorithms in Bioinformatics
Protein Folds – Mainly β
Nurit Haspel CS612 - Algorithms in Bioinformatics
Protein Folds – α/β
Nurit Haspel CS612 - Algorithms in Bioinformatics
Protein Structure Determination
How does a protein know its fold?
It is believed that all the information is encoded in its primarystructure (amino acid sequence).
Yet, no algorithm exists as of today to successfully predict thisstructure – the protein folding problem.
There are experimental methods to determine proteinstructure.
Nurit Haspel CS612 - Algorithms in Bioinformatics
X-ray Crystallography
Crystallize the protein.
Pass an X-ray to create adiffraction pattern.
Reconstruct atomic modelfrom electron density map.
Nurit Haspel CS612 - Algorithms in Bioinformatics
NMR (Nuclear Magnetic Resonance) Spectroscopy
Magnetic field is applied to asolution containing the protein.
Atomic nuclei are aligned by thefield.
When unaligned, the nuclei giveoff a typical signal.
Inter-atomic distances can beinferred.
The 3-D structure can bemodeled.
Done in solutions – atoms arefree to move.
Usually produces an ensembleof structures.
Nurit Haspel CS612 - Algorithms in Bioinformatics
Cryo-Electron Microscopy (Cryo-EM
Samples are frozen tocryo-temperatures (of liquidnitrogen)
EM is used to obtain animage.
Multiple 2D images are usedto construct a 3D image.
Especially useful for verylarge macromolecules (entireviruses, ribosomes...)
Started at 1nm resolution,nowadays approaching 2-3A.
http://blogs.sciencemag.org/
Nurit Haspel CS612 - Algorithms in Bioinformatics