Protein Structure Databases

Preview:

DESCRIPTION

Protein Structure Databases. Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear magnetic resonance (NMR) techniques Protein Databases: PDB (protein data bank) Swiss-Prot PIR ( Protein Information Resource) - PowerPoint PPT Presentation

Citation preview

Protein Structure Databases

Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear magnetic resonance (NMR) techniques

Protein Databases: PDB (protein data bank) Swiss-Prot PIR (Protein Information Resource) SCOP (Structural Classification of Proteins)

2

Fibrous proteins have a structural role

Source:http://www.prideofindia.net/images/nails.jpg http://opbs.okstate.edu/~petracek/2002%20protein%20structure%20function/CH06/Fig%2006-12.GIFhttp://my.webmd.com/hw/health_guide_atoz/zm2662.asp?printing=true

• Collagen is the most abundant protein in vertebrates. Collagen fibers are a major portion of tendons, bone and skin. Alpha helices of collagen make up a triple helix structure giving it tough and flexible properties.

• Fibroin fibers make the silk spun by spiders and silk worms stronger weight for weight than steel! The soft and flexible properties come from the beta structure.

• Keratin is a tough insoluble protein that makes up the quills of echidna, your hair and nails and the rattle of a rattle snake. The structure comes from alpha helices that are cross-linked by disulfide bonds.

3

The globular proteinsThe globular proteins have a number of biologically important roles. They

include:

Cell motility – proteins link together to form filaments which make movement possible.

Organic catalysts in biochemical reactions – enzymes

Regulatory proteins – hormones, transcription factors

Membrane proteins – MHC markers, protein channels, gap junctions

Defense against pathogens – poisons/toxins, antibodies, complement

Transport and storage – hemoglobin and myosin

4

Proteins for cell motility

Source: http://www.ebsa.org/npbsn41/maf_home.htmlhttp://sun0.mpimf-

Above: Myosin (red) and actin filaments (green) in coordinated muscle contraction.

Right: Actin bound to the mysoin binding site (groove in red part of myosin protein).

Add energy (ATP) and myosin moves, moving actin with it.

5

Eukaryote cells have a cytoskeleton made up of straight hollow cylinders called microtubules (bottom left).They help cells maintain their shape, they act like conveyer belts moving organelles around in the cytoplasm, and they participate in forming spindle fibres in cell division. Microtubules are composed of filaments of the protein, tubulin (top left) . These filaments are compressed like springs allowing microtubules to ‘stretch and contract’. 13 of these filaments attach side to side, a little like the slats in a barrel, to form a microtubule. This barrel shaped structure gives strength to the microtubule.

Tubulin forms helical

filaments

Source: heidelberg.mpg.de/shared/docs/staff/user/0001/24.php3?department=01&LANG=enhttp://www.fz-juelich.de/ibi/ibi-1/Cellular_signaling/http://cpmcnet.columbia.edu/dept/gsas/anatomy/Faculty/Gundersen/main.html

Proteins in the Cell Cytoskeleton

6

Catalase speeds up the breakdown of hydrogen peroxide, (H2O2) a toxic by product of metabolic reactions, to the harmless substances, water and oxygen.

The reaction is extremely rapid as the enzyme lowers the energy needed to kick-start the reaction (activation energy)

Energy

Progress of reaction

Substrate Product

No catalyst = Input of 71kJ energy required

Activation Energy

With catalase = Input of 8 kJ energy required

Proteins speed up reactions - Enzymes

+2 2

7

Proteins can regulate metabolism – hormones

When your body detects an increase in the sugar content of blood after a meal, the hormone insulin is released from cells in the pancreas.

Insulin binds to cell membranes and this triggers the cells to absorb glucose for use or for storage as glycogen in the liver.

Proteins span membranes –protein channels

Source: http://www.biology.arizona.edu/biochemistry/tutorials/chemistry/page2.htmlhttp://www.cbp.pitt.edu/bradbury/projects.htm

The CFTR membrane protein is an ion channel that regulates the flow of chloride ions.

Not enough of this protein gets inserted into the membranes of people suffering Cystic fibrosis. This causes secretions to become thick as they are not hydrated. The lungs and secretory ducts become blocked as a consequence.

8

Proteins Defend us against pathogens –antibodies

Source: http://www.biology.arizona.edu/immunology/tutorials/antibody/FR.htmlhttp://tutor.lscf.ucsb.edu/instdev/sears/immunology/info/sears-ab.htmhttp://www.spilya.com/research/http://www.umass.edu/microbio/chime/

Left: Antibodies like IgG found in humans, recognise and bind to groups of molecules or epitopes found on foreign invaders.

Right: The binding site of an antigen protein (left) interacting with the epitope of a foreign antigen (green)

9

Making ProteinsHow are such a diverse range of proteins possible? The code for making a protein is found in your genes (on your DNA). This genetic code is copied onto a messenger RNA molecule. The mRNA code is read in multiples of 3 (a codon) by ribosomes which join amino acids together to form a polypeptide. This is known as gene expression.

Source: http://genetics.nbii.gov/Basic1.html

10

G T A C T A

Chromosome

The order of bases in DNA is a code for making proteins. The code is read in groups of three

DNAGene

Cell machinery copies the code making an mRNA molecule. This moves into the cytoplasm.Ribosomes read the code and accurately join Amino acids together to make a protein

AUGAGUAAAGGAGAAGAACUUUUCACUGGAUAM S E E LK G TF G

The protein folds to form its working shape

MS EK G

E L TF GM

S

E

K

GE L TF G

MS

E

K

G

EL

TF

G

MS

E

K

G

EL

TF

G

M

S

E

K

G

EL

T FG

CELL

NUCLEUS

Gene Expression

M

S

E

K

G

EL

T FG

T

GM

S

E

KG

EL

F

T

G

M

E

KG

EL

FS

11

The building blocksThe amino acids for making new

proteins come from the proteins that you eat and digest. Every time you eat a burger (vege or beef), you break the proteins down into single amino acids ready for use in building new proteins. And yes, proteins have the job of digesting proteins, they are known as proteases.

There are only 20 different amino acids but they can be joined together in many different combinations to form the diverse range of proteins that exist on this planet

12

Amino AcidsAn amino acid is a relatively small molecule with characteristic groups of atoms that determine its chemical behaviour.

The structural formula of an amino acid is shown at the end of the animation below. The R group is the only part that differs between the 20 amino acids.

O

RO

HH

HH N C CH3C

CH3

C HCH

H H

GlycineAlanineValineCysteinePhenylalanine

HH

CS

H HCH H

Amino Acid

13

The 20 Amino AcidsThe amino acids each have their own shape

and charge due to their specific R group.

View the molecular shape of amino acids by clicking on the URL link below:

http://sosnick.uchicago.edu/amino_acids.html

Would the shape of a protein be affected if the wrong amino acid were added to a growing protein chain?

14

Making a PolypeptideH2N

C

O

C

R

C

O

CO¯H

R

N

H

HO H

O HH N C

O

C

RH

O HH C

O

C

R

N

H

NC

O

C

R

H2NC

O

C

R H

O HO HN

C

O

C

R

H2NC

O

C

R H

N C

O

C

RH

Peptide Bond Peptide BondPeptide Bond

Polypeptide production = Condensation Reaction

PolypeptideGrowth

15

Protein structure

16

Why Investigate Protein Structure?

Proteins are complex molecules whose structure can be discussed in terms of:

primary structuresecondary structuretertiary structurequaternary structure

The structure of proteins is important as the shape of a protein allows it to perform its particular role or function

Four levels of protein structure

18

Protein Primary StructureThe primary structure is the sequence of amino acids that are linked

together. The linear structure is called a polypeptide

http://www.mywiseowl.com/articles/Image:Protein-primary-structure.png

19

Protein Secondary StructureThe secondary structure of proteins consists of:

alpha helicesbeta sheetsRandom coils – usually form the binding and active sites of

proteins

Source: http://www.rothamsted.bbsrc.ac.uk/notebook/courses/guide/prot.htm#I

20

Protein Tertiary Structure

Involves the way the random coils, alpha helices and beta sheets fold in respect to each other.

This shape is held in place by bonds such as• weak Hydrogen bonds between amino

acids that lie close to each other, • strong ionic bonds between R groups

with positive and negative charges, and• disulfide bridges (strong covalent S-S

bonds)

Amino acids that were distant in the primary structure may now become very close to each other after the folding has taken place

The subunit of a more complex protein has now been formed. It may be globular or fibrous. It now has its functional shape or conformation.

Source: io.uwinnipeg.ca/~simmons/ cm1503/proteins.htm

21

Protein Quaternary Structure

This is packing of the protein subunits to form the final protein complex. For example, the human hemoglobin molecule is a tetramer made up of two alpha and two beta polypeptide chains (right)

Source: www.cem.msu.edu/~parrill/movies/neuram.GIF

This is also when the protein associates with non-proteic groups. For example, carbohydrates can be added to form a glycoprotein

Source: www.ibri.org/Books/ Pun_Evolution/Chapter2/2.6.htm

Protein Structure Prediction

Why ? Type of protein structure

predictions Sec Str. Pred Homology Modelling Fold Recognition Ab Initio

Secondary structure prediction Why History Performance Usefullness

Why do we need structure prediction?

3D structure give clues to function: active sites, binding sites,

conformational changes... structure and function conserved

more than sequence 3D structure determination is

difficult, slow and expensive Intellectual challenge, Nobel prizes

etc... Engineering new proteins

The Use of Structure

The Use of Structure

The Use of Structure

It's not that simple... Amino acid sequence

contains all the information for 3D structure (experiments of Anfinsen, 1970's)

But, there are thousands of atoms, rotatable bonds, solvent and other molecules to deal with...

Levinthal's paradox

Structure predictionSummary of the four main approaches to structure

prediction. Note that there are overlaps between nearly all categories.

Method Knowledge Approach Difficulty Usefulness

Comparative modelling (Homology modelling)

Proteins of known structure

Identify related structure with sequence methods, copy 3D coords and modify where necessary

Relatively easy Very, if sequence identity drug design

Fold recognition

Proteins of known structure

Same as above, but use more sophisticated methods to find related structure

Medium Limited due to poor models

Secondary structure prediction

Sequence-structure statistics

Forget 3D arrangement and predict where the helices/strands are

Medium Can improve alignments, fold recognition, ab initio

ab initio tertiary structure prediction

Energy functions, statistics

Simulate folding, or generate lots of structures and try to pick the correct one

Very hard Not really

Secondary structures -Helix

Secondary Structure - Sheet

Secondary structure - turns

Secondary Structure Predictions

Some highlights in performance

1974 Chou and Fasman 50%

1978 Garnier 62%

1993 PhD 72% 2000 PsiPred 76%

Secondary structure

prediction 1st

generation methods

Chou and Fassman1) Assign all residues the appropriate set of

parameters. 2) Scan through the peptide and identify helical

regions 3) Repeat this procedure to locate all of the helical

regions in the sequence. 4) Scan through the peptide and identify sheet regions. 5) Solve conflicts between helical and sheet

assignments 6) Identify turns

Claims of around 70-80% - actual accuracy about 50-60%

Helix Strand

Strong former E A L M V I

Former H M Q W V F C Y F Q L T W

Weak former K I A

Indifferent D T S R C R G D

Breaker N Y K S H N P

Strong breaker P G E

GOR III Garnier, Osguthorpe, Robson, 1990

Secondary structure depends on aminoacids propensities As in Chou Fassman

Also influences by neighboring residues Helix capping Turns etc

How to include distant information. Performance approximately 67%

GOR III Garnier, Osguthorpe, Robson, 1990

The helix propensity tables thus have 20x17 entries. Assign the state with the highest propensity

Status of predictions in 1990

Too short secondary structure segments

About 65% accuracy Worse for Beta-strands Example:

Secondary structure prediction

2nd generation methods sequence-to-structure relationship

modelled using more complex statistics, e.g. artificial neural networks (NNs) or hidden Markov models (HMMs)

evolutionary information included (profiles)

prediction accuracy >70% (PhD, Rost 1993)

PhD-predictions

Secondary structure ``prediction'' by homology

If sequence of unknown secondary structure has a homologue of known structure, it is more accurate to make an alignment and copy the known secondary structure over to the unknown sequence, than to do ``ab initio'' secondary structure prediction.

3rd generation methods enhanced evolutionary sequence

information (PSI-BLAST profiles) and larger sequence databases takes Q3 to > 75%

PHD and PSIPRED are the best known methods

PSIPRED

Similar to PhD Psiblast to detect more remote

homologs only two layers SVM or NN gives similar

performance

Alignment of Protein Structure

Compare 3D structure of one protein against 3D structure of second protein

Compare positions of atoms in three-dimensional structures

Look for positions of secondary structural elements (helices and strands) within a protein domain

Exam distances between carbon atoms to determine degree structures may be superimposed

Side chain information can be incorporated Buried; visible

Structural similarity between proteins does not necessarily mean evolutionary relationship

Alignment of Protein Structure

T

Simple case – two closely related proteins with the same number of amino acids.

Structure alignment

Find a transformationto achieve the best superposition

Types ofStructure

Comparison Sequence-dependent vs. sequence-

independent structural alignment

Global vs. local structural alignment

Pairwise vs. multiple structural alignment

1234567ASCRKLE¦¦¦¦¦¦¦ASCRKLE

1

2

3 45 6

7

1

2

34 5

6 7

Minimize rmsd of distances 1-1,...,7-7

N

i

iyixN

rmsd 2))()((1

Sequence-dependent Structure Comparison

1

2

3 45 6

71

2

34 5

67

Sequence-dependent Structure Comparison

Can be solved in O(n) time.

Useful in comparing structures of the same protein solved in different methods, under different conformation, through dynamics.

Evaluation protein structure prediction.

Sequence-independent Structure Comparison

Given two configurations of points in the three dimensional space:

find T which produces “largest” superimpositions of corresponding 3-D points.

T

Evaluating Structural Alignments

1. Number of amino acid correspondences created.2. RMSD of corresponding amino acids3. Percent identity in aligned residues4. Number of gaps introduced5. Size of the two proteins6. Conservation of known active site environments 7. …

No universally agreed upon criteria. It depends on what you are using the alignment for.

Recommended