12
The Protein Data Bank Europe (PDBe) (http://www.pdbe.org/ ) is one of the worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly available 3-dimensional structures of biological macromolecules. The Worldwide Protein Data Bank (wwPDB) consists of organizations that act as deposition, data processing and distribution centres for PDB data. The founding members are RCSB PDB (USA), PDBe (Europe) and PDBj (Japan). The mission of the wwPDB is to maintain a single Protein Data Bank Archive of macromolecular structural data that is freely and publicly available to the global community. In addition to its role in data deposition, processing and distribution, the PDBe is also involved in the creation of a relational database that integrates data available from experimentally determined protein structure with protein sequence information, textual information from scientific publications and a number of derived properties that augment the macromolecular structure information. The database also contains information on a variety of ligands, cofactors and smaller chemical entities that interact with a protein. The PDBe group has also developed several algorithms for analysis of protein structures. In addition, information from 3D electron microscopy is stored and can be accessed from the database. Introduction to Protein Structures (Expected Time for completion: 1 hour) This exercise will cover the basic types of protein structures as represented in the Protein Data Bank and an introduction to the PDBe entry information pages and some search and analysis services. For a detailed explanation of protein structure components, please see this excellent introduction in Wikipedia . Fold refers to a global type of arrangement, like helix-bundle or beta-barrel . Although, there are now over 65,000 experimentally determined structures in the PDB, the number of unique folds that these protein adopt is limited, and all proteins can be classified into one of more fold categories, which are annotated in databases like CATH and SCOP . More often than not, similar functions may be associated with certain fold of proteins, and the fold classification therefore, serves as an important tool in understanding the possible function of a protein.

Introduction to Protein Structures · worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly available 3-dimensional structures of biological macromolecules

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Protein Structures · worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly available 3-dimensional structures of biological macromolecules

The Protein Data Bank Europe (PDBe) (http://www.pdbe.org/) is one of the

worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly

available 3-dimensional structures of biological macromolecules. The Worldwide Protein

Data Bank (wwPDB) consists of organizations that act as deposition, data processing and

distribution centres for PDB data. The founding members are RCSB PDB (USA), PDBe

(Europe) and PDBj (Japan). The mission of the wwPDB is to maintain a single Protein

Data Bank Archive of macromolecular structural data that is freely and publicly available

to the global community.

In addition to its role in data deposition, processing and distribution, the PDBe is

also involved in the creation of a relational database that integrates data available from

experimentally determined protein structure with protein sequence information, textual

information from scientific publications and a number of derived properties that augment

the macromolecular structure information. The database also contains information on a

variety of ligands, cofactors and smaller chemical entities that interact with a protein. The

PDBe group has also developed several algorithms for analysis of protein structures. In

addition, information from 3D electron microscopy is stored and can be accessed from

the database.

Introduction to Protein Structures

(Expected Time for completion: 1 hour)

This exercise will cover the basic types of protein structures as represented in the Protein

Data Bank and an introduction to the PDBe entry information pages and some search and

analysis services. For a detailed explanation of protein structure components, please see

this excellent introduction in Wikipedia. Fold refers to a global type of arrangement, like

helix-bundle or beta-barrel. Although, there are now over 65,000 experimentally

determined structures in the PDB, the number of unique folds that these protein adopt is

limited, and all proteins can be classified into one of more fold categories, which are

annotated in databases like CATH and SCOP. More often than not, similar functions may

be associated with certain fold of proteins, and the fold classification therefore, serves as

an important tool in understanding the possible function of a protein.

Page 2: Introduction to Protein Structures · worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly available 3-dimensional structures of biological macromolecules

Alpha-helix proteins: There are many different families of proteins which are

composed of only alpha-helices. Please see extra details here. Some examples to explore

are given below.

PDB Entry: 1IRD

Start with the PDBe home page (http://www.pdbe.org/), and in the space provided for Get

PDB by id, type in 1ird, and click on “Go”.

The browser will take you to the entry summary page for this PDB entry 1IRD. Every

entry in the Protein Data Bank is assigned a unique 4-letter „IDCODE‟ The summary

pages provide information concerning various facets of the deposited structure, including

links to external sites and other information derived from the structure itself. Underlined

texts on the summary page are external links, and to search for a particular item in the

whole PDB, click on the ( ) icons.

Page 3: Introduction to Protein Structures · worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly available 3-dimensional structures of biological macromolecules

Coming back to 1IRD, this structure is of Human haemoglobin bound to carbon

monoxide. Choose the visualisation link from the sidebar on the left to view the structure

interactively by choosing “View the PDB entry using Astex viewer.

This will open up a graphics window as shown on the right. The display is interactive and

you can rotate the structure, or center on any residue by clicking on the sequence on the

bottom bar. Click on “Reset View” to zoom out. Looking at the structure, you will notice

that this protein is only composed of alpha-helices. To see the bound heme, choose

“Magic Lens from the menu and move your mouse over the structure! To see which

residues of the protein interact with the heme, choose “Chemistry” and click on any one

of the two HEM‟s shown. You may click on any of the residues shown in the chemistry

popup to center the structure on that residue. The sliders on the top of the chemistry

popup allow adjustment of distances between 0 and 4A from the ligand to see specific

interactions.

To look at the sequence section on the atlas page for this entry, click the 'Structure' and

Page 4: Introduction to Protein Structures · worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly available 3-dimensional structures of biological macromolecules

then 'Primary' in the sidebar, which will show you sequence alignment with UniProt. You

can also view the Pfam classification for this family of proteins (Globin).

Structural Classification for this protein is available from the Tertiary Structure section of

the sidebar. Both SCOP and CATH databases suggest that this protein is a member of all

alpha-helical globin family.

Links to all other cross-referenced external databases are listed under 'Cross references' in

the sidebar. Look at the GO (Gene Ontology) reference here, which lists both the

processes and function this protein is involved in. The “Ligands” section of the summary

pages provide additional on the compounds that this protein is associated with in this

entry. Click on the “Ligands” link on the sidebar and then "interactions" link to view all

interactions between the heme group and the protein in this entry. This will open up a

new browser window/tab to show this information.

This will take you to our PDBeMotif service and will show that HEM in this structure

Page 5: Introduction to Protein Structures · worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly available 3-dimensional structures of biological macromolecules

interacts with Histidine, CMO (Carbon monoxide), Tyrosine and others. All interactions

are colour-coded to indicate the nature of the interaction.

Go back to the summary pages now. To assess the quality of this structure (1ird), expand

the “Links” section from the left hand sidebar and choose “PDBe Validation”. This will

open a new window/tab and show you the geometric quality of the structure

(Ramachandran plot, bond angles/bond lengths etc). Please see here for an explanation

of the Ramachandran plot.

To see if there are any other structures in the PDB that are similar to this one, from the

same “Links” menu on the summary page, choose the “PDBe SSM” link. This will start

the PDBeSSM service that provides a rapid structure alignment and comparison tool.

This job may take a few minutes to complete. Once the task is completed you will be

shown a page containing the results. Scroll to the bottom of this page and resort the

results by “%seq” as shown below.

Page 6: Introduction to Protein Structures · worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly available 3-dimensional structures of biological macromolecules

The page will refresh. Now choose the “Last page” button from the top of this page.

Let see the last result on the last page. The columns of data tell us that this particular

structure has only 7% sequence identity to our haemoglobin but shares 75% structural

identity. Click on the left hand link for this result to see the details page.

Click on the “View Superposed” button to show the two structures aligned with respect to

each other. This will open up a graphical window.

You can see from the alignment that the two proteins are both made up on alpha-helices

arranged in a similar orientation and yet have minimal sequence identity.

Page 7: Introduction to Protein Structures · worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly available 3-dimensional structures of biological macromolecules

Let now move on and explore some other predominant secondary structure folds present

in protein structures.

Beta-sheet proteins: These proteins are composed of only beta-sheets, the other

characteristic secondary structure element in proteins. This group is fairly large and

comprises proteins with widely varying functions, from sugar-binding to metabolic

transport to antibodies. Some examples are given below for you to explore. In each

example below, look at the structure as above, as well as pay attention to the Pfam,

CATH and SCOP classification for each entry to get a feel for the structure.

PDB entry: 1A0S This protein is a beta-barrel protein and is involved in the transport of maltodextrin across

the outer membrane of gram-negative bacteria. Other proteins share similar topology to

this protein. More information about related proteins can be seen from the Pfam Entry.

Essentially this family is comprised of proteins that are collectively called porins. Look at

the GO and Pfam entries for this protein.

Explore the various pages for this entry. See the Primary, Secondary and Tertiary

structures for this protein. You‟ll see that the secondary structure of this protein is

predominantly beta sheets. You may also read the abstract of the paper where this

structure was described (the “Citations” link from the sidebar).

Lets answer a few questions !

Page 8: Introduction to Protein Structures · worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly available 3-dimensional structures of biological macromolecules

Question 1: What compound/s is this protein associated with and what are the

interactions between the compound/s and the protein? (HINT: Look at the ligands page!).

Answer:

Question 2: Which other entries in the PDB are of the same protein? (HINT: The lens

symbol next to the UniProt identifier on the summary page will do a search for all other

entries in the PDB that contain the same sequence).

Answer:

Question 3: One of the authors for this entry is K.Diederichs (Authors section on the

summary page). How many other structures in the PDB have this person as an author?).

Answer:

PDB entry: 1BKZ

This is a structure of a protein called galectin-7. This protein belongs a specific family of

proteins called galectin (or s-lectin). The name derives from the fact that almost members

of this family of protein bind to galactoside sugars. This protein belongs to a different

family of all beta-sheet proteins (SCOP entry). Look at the various links from the sidebar

for this entry for more information and try to answer the questions below.

Question 1: What is biological function of this protein as described by GO? (HINT: See

the Cross-References section for the GO classification.)

Answer:

Page 9: Introduction to Protein Structures · worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly available 3-dimensional structures of biological macromolecules

Question 2: This structure does not appear to be bound to any ligand, but are there any

other PDB entries for the same protein that have bound ligands? Which sugars do the

other entries for the same protein associate with? (Hint: Search for other entries that

contain the same UniProt sequence and see their titles and summary pages!)

Answer:

Question 3: Look at the ligands page for PDB entry 1w6o (http://www.pdbe.org/1w6o).

Is there anything common between the ligand interactions of LAT (alpha-lactose) in

1w6o and GAL (beta-D-galactose) in 2gal (http://www.pdbe/org/2gal)? (Hint: Look at

the interactions of GAL with PDB entry 2GAL and the interactions of LAT with 1w6o

from the ligand sections for each of the entries).

Answer:

Alpha-Beta proteins: This is the most populous category in protein fold classification.

(Link to SCOP (a/b) and Link to SCOP (a+b) ). SCOP has a total of 415 classes of

proteins that are composed of alpha helices and beta-sheets in different topologies. Most

enzymes fall into one of these families. Let us look at a few examples.

PDB entry: 1AFL 1AFL is the structure of pancreatic ribonuclease from Bos taurus (cattle). Ribonucleases

make up a large family of proteins with similar enzymatic functions and structures and

include members that are implicated in angiogenesis (blood vessel growth in cancers).

Ribonucleases essentially cleave RNA. Read more about the function of ribonucleases

from the InterPro and Pfam entries for this structure from the “Cross references” on the

sidebar.

Page 10: Introduction to Protein Structures · worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly available 3-dimensional structures of biological macromolecules

As is probably obvious there are over 150 structures of pancreatic ribonuclease

determined in complex with various enzymatic inhibitors. Look at the “Ligands” page for

this entry. This protein is bound to a compound called ATR, which is a modified

ribonucleotide that binds to the active site and inhibits the activity of the enzyme. View

the interactions of ATR with 1AFL.

Under the “Links” sidebar , click on “PDBe SSM” to compare the 1AFL fold with the

rest of the PDB. Once the results of these are available, choose "Sort by Seq%" from the

bottom of the page and wait for this page reload. From the right side bottom of the page,

choose the "Last Page" button to go to the last page. Look for 2i5s in the results, which

has 29% sequence identity and 78% identity in structure. Click on the link on the right

side and in the details results page choose “View superposed”. This will throw up a

graphics window and show the structural alignment of 1AFL with 2I5S (an onconase).

You can rotate the aligned structures. The two structures are very similar in fold but share

very low sequence identity.

Now look at the atlas pages for 2I5S (http://www.pdbe.org/2i5s) and answer the

following questions.

Page 11: Introduction to Protein Structures · worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly available 3-dimensional structures of biological macromolecules

Question 1: What are the similarities in function between 1AFL and 2I5S ? (Hint: Look

at the GO entries for both structures!).

Answer:

Question 2: Which residues from the protein interact with ligand ATR by hydrogen-bond

interactions in PDB entry 1AFL?

Answer:

Question 3: This protein is from Bos taurus. How many protein structures in the PDB

have been determined from the same source? (HINT: Click on the lens symbol) next to

the organism on the summary page for PDB entry 1AFL).

Answer:

Question 4: What is the EC number for this class of protein? (HINT: Summary Page),

and how many structures of the same enzyme class have been determined? (HINT: Lens

!).

Answer:

PDB entry: 1KWW Look at the summary page for this entry (http://www.pdbe.org/1kww), the structure of a

mannose-binding protein from Rat. These sugar-binding proteins are characterized by

binding mannoside sugars in the presence of metal ions such as Calcium (hence called c-

lectins). Look over the structure carefully and try to answer the questions below.

Page 12: Introduction to Protein Structures · worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly available 3-dimensional structures of biological macromolecules

Question 1: How many structures for this protein are present in the PDB archive ?

Answer:

Question 2: Which protein residues interact with Calcium (CA) ions present in the

structure, and what is the predominant nature of this interaction ?

Answer:

Question 3: Compare the binding site of MFU in PDB entry 1KWW

(http://www.pdbe.org/1kww), with that of the binding site of FUC in PDB entry 3KMB

(http://www.pdbe.org/3kmb). Can you identify the binding-site for sugars by looking at

the residues that interact with the protein in both cases? (Hint: Look at the ligand

interactions for both MUC in 1KWW and FUC in 3KMB by looking at atlas pages for

both these entries).

Answer: