48
Protein Structure Databases Databases of three dimensional structures of proteins, where structure has been solved using X- ray crystallography or nuclear magnetic resonance (NMR) techniques Protein Databases: PDB (protein data bank) Swiss-Prot PIR (Protein Information Resource) SCOP (Structural Classification of Proteins)

Protein Structure Databases

  • Upload
    yahto

  • View
    68

  • Download
    0

Embed Size (px)

DESCRIPTION

Protein Structure Databases. Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear magnetic resonance (NMR) techniques Protein Databases: PDB (protein data bank) Swiss-Prot PIR ( Protein Information Resource) - PowerPoint PPT Presentation

Citation preview

Page 1: Protein Structure Databases

Protein Structure Databases

Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear magnetic resonance (NMR) techniques

Protein Databases: PDB (protein data bank) Swiss-Prot PIR (Protein Information Resource) SCOP (Structural Classification of Proteins)

Page 2: Protein Structure Databases

2

Fibrous proteins have a structural role

Source:http://www.prideofindia.net/images/nails.jpg http://opbs.okstate.edu/~petracek/2002%20protein%20structure%20function/CH06/Fig%2006-12.GIFhttp://my.webmd.com/hw/health_guide_atoz/zm2662.asp?printing=true

• Collagen is the most abundant protein in vertebrates. Collagen fibers are a major portion of tendons, bone and skin. Alpha helices of collagen make up a triple helix structure giving it tough and flexible properties.

• Fibroin fibers make the silk spun by spiders and silk worms stronger weight for weight than steel! The soft and flexible properties come from the beta structure.

• Keratin is a tough insoluble protein that makes up the quills of echidna, your hair and nails and the rattle of a rattle snake. The structure comes from alpha helices that are cross-linked by disulfide bonds.

Page 3: Protein Structure Databases

3

The globular proteinsThe globular proteins have a number of biologically important roles. They

include:

Cell motility – proteins link together to form filaments which make movement possible.

Organic catalysts in biochemical reactions – enzymes

Regulatory proteins – hormones, transcription factors

Membrane proteins – MHC markers, protein channels, gap junctions

Defense against pathogens – poisons/toxins, antibodies, complement

Transport and storage – hemoglobin and myosin

Page 4: Protein Structure Databases

4

Proteins for cell motility

Source: http://www.ebsa.org/npbsn41/maf_home.htmlhttp://sun0.mpimf-

Above: Myosin (red) and actin filaments (green) in coordinated muscle contraction.

Right: Actin bound to the mysoin binding site (groove in red part of myosin protein).

Add energy (ATP) and myosin moves, moving actin with it.

Page 5: Protein Structure Databases

5

Eukaryote cells have a cytoskeleton made up of straight hollow cylinders called microtubules (bottom left).They help cells maintain their shape, they act like conveyer belts moving organelles around in the cytoplasm, and they participate in forming spindle fibres in cell division. Microtubules are composed of filaments of the protein, tubulin (top left) . These filaments are compressed like springs allowing microtubules to ‘stretch and contract’. 13 of these filaments attach side to side, a little like the slats in a barrel, to form a microtubule. This barrel shaped structure gives strength to the microtubule.

Tubulin forms helical

filaments

Source: heidelberg.mpg.de/shared/docs/staff/user/0001/24.php3?department=01&LANG=enhttp://www.fz-juelich.de/ibi/ibi-1/Cellular_signaling/http://cpmcnet.columbia.edu/dept/gsas/anatomy/Faculty/Gundersen/main.html

Proteins in the Cell Cytoskeleton

Page 6: Protein Structure Databases

6

Catalase speeds up the breakdown of hydrogen peroxide, (H2O2) a toxic by product of metabolic reactions, to the harmless substances, water and oxygen.

The reaction is extremely rapid as the enzyme lowers the energy needed to kick-start the reaction (activation energy)

Energy

Progress of reaction

Substrate Product

No catalyst = Input of 71kJ energy required

Activation Energy

With catalase = Input of 8 kJ energy required

Proteins speed up reactions - Enzymes

+2 2

Page 7: Protein Structure Databases

7

Proteins can regulate metabolism – hormones

When your body detects an increase in the sugar content of blood after a meal, the hormone insulin is released from cells in the pancreas.

Insulin binds to cell membranes and this triggers the cells to absorb glucose for use or for storage as glycogen in the liver.

Proteins span membranes –protein channels

Source: http://www.biology.arizona.edu/biochemistry/tutorials/chemistry/page2.htmlhttp://www.cbp.pitt.edu/bradbury/projects.htm

The CFTR membrane protein is an ion channel that regulates the flow of chloride ions.

Not enough of this protein gets inserted into the membranes of people suffering Cystic fibrosis. This causes secretions to become thick as they are not hydrated. The lungs and secretory ducts become blocked as a consequence.

Page 8: Protein Structure Databases

8

Proteins Defend us against pathogens –antibodies

Source: http://www.biology.arizona.edu/immunology/tutorials/antibody/FR.htmlhttp://tutor.lscf.ucsb.edu/instdev/sears/immunology/info/sears-ab.htmhttp://www.spilya.com/research/http://www.umass.edu/microbio/chime/

Left: Antibodies like IgG found in humans, recognise and bind to groups of molecules or epitopes found on foreign invaders.

Right: The binding site of an antigen protein (left) interacting with the epitope of a foreign antigen (green)

Page 9: Protein Structure Databases

9

Making ProteinsHow are such a diverse range of proteins possible? The code for making a protein is found in your genes (on your DNA). This genetic code is copied onto a messenger RNA molecule. The mRNA code is read in multiples of 3 (a codon) by ribosomes which join amino acids together to form a polypeptide. This is known as gene expression.

Source: http://genetics.nbii.gov/Basic1.html

Page 10: Protein Structure Databases

10

G T A C T A

Chromosome

The order of bases in DNA is a code for making proteins. The code is read in groups of three

DNAGene

Cell machinery copies the code making an mRNA molecule. This moves into the cytoplasm.Ribosomes read the code and accurately join Amino acids together to make a protein

AUGAGUAAAGGAGAAGAACUUUUCACUGGAUAM S E E LK G TF G

The protein folds to form its working shape

MS EK G

E L TF GM

S

E

K

GE L TF G

MS

E

K

G

EL

TF

G

MS

E

K

G

EL

TF

G

M

S

E

K

G

EL

T FG

CELL

NUCLEUS

Gene Expression

M

S

E

K

G

EL

T FG

T

GM

S

E

KG

EL

F

T

G

M

E

KG

EL

FS

Page 11: Protein Structure Databases

11

The building blocksThe amino acids for making new

proteins come from the proteins that you eat and digest. Every time you eat a burger (vege or beef), you break the proteins down into single amino acids ready for use in building new proteins. And yes, proteins have the job of digesting proteins, they are known as proteases.

There are only 20 different amino acids but they can be joined together in many different combinations to form the diverse range of proteins that exist on this planet

Page 12: Protein Structure Databases

12

Amino AcidsAn amino acid is a relatively small molecule with characteristic groups of atoms that determine its chemical behaviour.

The structural formula of an amino acid is shown at the end of the animation below. The R group is the only part that differs between the 20 amino acids.

O

RO

HH

HH N C CH3C

CH3

C HCH

H H

GlycineAlanineValineCysteinePhenylalanine

HH

CS

H HCH H

Amino Acid

Page 13: Protein Structure Databases

13

The 20 Amino AcidsThe amino acids each have their own shape

and charge due to their specific R group.

View the molecular shape of amino acids by clicking on the URL link below:

http://sosnick.uchicago.edu/amino_acids.html

Would the shape of a protein be affected if the wrong amino acid were added to a growing protein chain?

Page 14: Protein Structure Databases

14

Making a PolypeptideH2N

C

O

C

R

C

O

CO¯H

R

N

H

HO H

O HH N C

O

C

RH

O HH C

O

C

R

N

H

NC

O

C

R

H2NC

O

C

R H

O HO HN

C

O

C

R

H2NC

O

C

R H

N C

O

C

RH

Peptide Bond Peptide BondPeptide Bond

Polypeptide production = Condensation Reaction

PolypeptideGrowth

Page 15: Protein Structure Databases

15

Protein structure

Page 16: Protein Structure Databases

16

Why Investigate Protein Structure?

Proteins are complex molecules whose structure can be discussed in terms of:

primary structuresecondary structuretertiary structurequaternary structure

The structure of proteins is important as the shape of a protein allows it to perform its particular role or function

Page 17: Protein Structure Databases

Four levels of protein structure

Page 18: Protein Structure Databases

18

Protein Primary StructureThe primary structure is the sequence of amino acids that are linked

together. The linear structure is called a polypeptide

http://www.mywiseowl.com/articles/Image:Protein-primary-structure.png

Page 19: Protein Structure Databases

19

Protein Secondary StructureThe secondary structure of proteins consists of:

alpha helicesbeta sheetsRandom coils – usually form the binding and active sites of

proteins

Source: http://www.rothamsted.bbsrc.ac.uk/notebook/courses/guide/prot.htm#I

Page 20: Protein Structure Databases

20

Protein Tertiary Structure

Involves the way the random coils, alpha helices and beta sheets fold in respect to each other.

This shape is held in place by bonds such as• weak Hydrogen bonds between amino

acids that lie close to each other, • strong ionic bonds between R groups

with positive and negative charges, and• disulfide bridges (strong covalent S-S

bonds)

Amino acids that were distant in the primary structure may now become very close to each other after the folding has taken place

The subunit of a more complex protein has now been formed. It may be globular or fibrous. It now has its functional shape or conformation.

Source: io.uwinnipeg.ca/~simmons/ cm1503/proteins.htm

Page 21: Protein Structure Databases

21

Protein Quaternary Structure

This is packing of the protein subunits to form the final protein complex. For example, the human hemoglobin molecule is a tetramer made up of two alpha and two beta polypeptide chains (right)

Source: www.cem.msu.edu/~parrill/movies/neuram.GIF

This is also when the protein associates with non-proteic groups. For example, carbohydrates can be added to form a glycoprotein

Source: www.ibri.org/Books/ Pun_Evolution/Chapter2/2.6.htm

Page 22: Protein Structure Databases

Protein Structure Prediction

Why ? Type of protein structure

predictions Sec Str. Pred Homology Modelling Fold Recognition Ab Initio

Secondary structure prediction Why History Performance Usefullness

Page 23: Protein Structure Databases

Why do we need structure prediction?

3D structure give clues to function: active sites, binding sites,

conformational changes... structure and function conserved

more than sequence 3D structure determination is

difficult, slow and expensive Intellectual challenge, Nobel prizes

etc... Engineering new proteins

Page 24: Protein Structure Databases

The Use of Structure

Page 25: Protein Structure Databases

The Use of Structure

Page 26: Protein Structure Databases

The Use of Structure

Page 27: Protein Structure Databases

It's not that simple... Amino acid sequence

contains all the information for 3D structure (experiments of Anfinsen, 1970's)

But, there are thousands of atoms, rotatable bonds, solvent and other molecules to deal with...

Levinthal's paradox

Page 28: Protein Structure Databases

Structure predictionSummary of the four main approaches to structure

prediction. Note that there are overlaps between nearly all categories.

Method Knowledge Approach Difficulty Usefulness

Comparative modelling (Homology modelling)

Proteins of known structure

Identify related structure with sequence methods, copy 3D coords and modify where necessary

Relatively easy Very, if sequence identity drug design

Fold recognition

Proteins of known structure

Same as above, but use more sophisticated methods to find related structure

Medium Limited due to poor models

Secondary structure prediction

Sequence-structure statistics

Forget 3D arrangement and predict where the helices/strands are

Medium Can improve alignments, fold recognition, ab initio

ab initio tertiary structure prediction

Energy functions, statistics

Simulate folding, or generate lots of structures and try to pick the correct one

Very hard Not really

Page 29: Protein Structure Databases

Secondary structures -Helix

Page 30: Protein Structure Databases

Secondary Structure - Sheet

Page 31: Protein Structure Databases

Secondary structure - turns

Page 32: Protein Structure Databases

Secondary Structure Predictions

Some highlights in performance

1974 Chou and Fasman 50%

1978 Garnier 62%

1993 PhD 72% 2000 PsiPred 76%

Page 33: Protein Structure Databases

Secondary structure

prediction 1st

generation methods

Chou and Fassman1) Assign all residues the appropriate set of

parameters. 2) Scan through the peptide and identify helical

regions 3) Repeat this procedure to locate all of the helical

regions in the sequence. 4) Scan through the peptide and identify sheet regions. 5) Solve conflicts between helical and sheet

assignments 6) Identify turns

Claims of around 70-80% - actual accuracy about 50-60%

Helix Strand

Strong former E A L M V I

Former H M Q W V F C Y F Q L T W

Weak former K I A

Indifferent D T S R C R G D

Breaker N Y K S H N P

Strong breaker P G E

Page 34: Protein Structure Databases

GOR III Garnier, Osguthorpe, Robson, 1990

Secondary structure depends on aminoacids propensities As in Chou Fassman

Also influences by neighboring residues Helix capping Turns etc

How to include distant information. Performance approximately 67%

Page 35: Protein Structure Databases

GOR III Garnier, Osguthorpe, Robson, 1990

The helix propensity tables thus have 20x17 entries. Assign the state with the highest propensity

Page 36: Protein Structure Databases

Status of predictions in 1990

Too short secondary structure segments

About 65% accuracy Worse for Beta-strands Example:

Page 37: Protein Structure Databases

Secondary structure prediction

2nd generation methods sequence-to-structure relationship

modelled using more complex statistics, e.g. artificial neural networks (NNs) or hidden Markov models (HMMs)

evolutionary information included (profiles)

prediction accuracy >70% (PhD, Rost 1993)

Page 38: Protein Structure Databases

PhD-predictions

Secondary structure ``prediction'' by homology

If sequence of unknown secondary structure has a homologue of known structure, it is more accurate to make an alignment and copy the known secondary structure over to the unknown sequence, than to do ``ab initio'' secondary structure prediction.

Page 39: Protein Structure Databases

3rd generation methods enhanced evolutionary sequence

information (PSI-BLAST profiles) and larger sequence databases takes Q3 to > 75%

PHD and PSIPRED are the best known methods

Page 40: Protein Structure Databases

PSIPRED

Similar to PhD Psiblast to detect more remote

homologs only two layers SVM or NN gives similar

performance

Page 41: Protein Structure Databases

Alignment of Protein Structure

Compare 3D structure of one protein against 3D structure of second protein

Compare positions of atoms in three-dimensional structures

Look for positions of secondary structural elements (helices and strands) within a protein domain

Exam distances between carbon atoms to determine degree structures may be superimposed

Side chain information can be incorporated Buried; visible

Structural similarity between proteins does not necessarily mean evolutionary relationship

Page 42: Protein Structure Databases

Alignment of Protein Structure

Page 43: Protein Structure Databases

T

Simple case – two closely related proteins with the same number of amino acids.

Structure alignment

Find a transformationto achieve the best superposition

Page 44: Protein Structure Databases

Types ofStructure

Comparison Sequence-dependent vs. sequence-

independent structural alignment

Global vs. local structural alignment

Pairwise vs. multiple structural alignment

Page 45: Protein Structure Databases

1234567ASCRKLE¦¦¦¦¦¦¦ASCRKLE

1

2

3 45 6

7

1

2

34 5

6 7

Minimize rmsd of distances 1-1,...,7-7

N

i

iyixN

rmsd 2))()((1

Sequence-dependent Structure Comparison

1

2

3 45 6

71

2

34 5

67

Page 46: Protein Structure Databases

Sequence-dependent Structure Comparison

Can be solved in O(n) time.

Useful in comparing structures of the same protein solved in different methods, under different conformation, through dynamics.

Evaluation protein structure prediction.

Page 47: Protein Structure Databases

Sequence-independent Structure Comparison

Given two configurations of points in the three dimensional space:

find T which produces “largest” superimpositions of corresponding 3-D points.

T

Page 48: Protein Structure Databases

Evaluating Structural Alignments

1. Number of amino acid correspondences created.2. RMSD of corresponding amino acids3. Percent identity in aligned residues4. Number of gaps introduced5. Size of the two proteins6. Conservation of known active site environments 7. …

No universally agreed upon criteria. It depends on what you are using the alignment for.