Biomolekulare Strukturmodellierung · Biomolekulare Strukturmodellierung I) Structure of proteins,...

Preview:

Citation preview

Biomolekulare Strukturmodellierung

DKFZ, Abteilung Molekulare BiophysikMichaela Knapp-Mohammady

Biomolekulare StrukturmodellierungI) Structure of proteins, basics

- Primary structure- Secondary structure- Tertiary structure

II) Protein modelling, tools and techniques- Primary structure analysis- Secondary structure prediction- Tertiary structure analysis and modelling- Protein simulation

����������������� ������������������� ��������������� ������ ������������������� �������� ����������� ��!��� �������"������� ��#������ ��� ����!$�"���������� ��� %��&��������������"��������������� �������$�'��������( ������)��������������*�&������ &������������������������ ������!�����'����������������������++�����!,

Nachfolgend das vollständige Gen in komplementärer Sequenz:

GGATCCTGCC AGAGCCTCCT CCCACCTGGA GGGGTCCCAG CGTCCACCTT CCCTGCCCCA 60GCCCCCCTCC TCGAGGTACT GGGAGGCTGG ATAAAGTCTT CGGCTGGGCC ACACCCCACC 120CCAAATTCTC CCTGTCCCAC CCTAGTGCCC AGGCCACCCC GGCCTGCTCC CTTCCGCAAG 180GCACCTCACC TTCTGTGCCC AGACCATTAG CCAACGCGGT GACCTTGACC CCGGCCCAGG 240CCCTGCTAAT GAAGAGGAAA GCCCGTACGC ACTCGGCCTG ACCCACGGCG ACCCTCTGTG 300ACCAATCATA CTACCAACCT CTTAAACAGA GCTCCACCGA CGCAATGCCC AGGCATAAAA 360AGGCCAGGCC GAGAGACCGC CACCAGTCAC GGACCCTGGA CCCAGCGCAC CCGCACCATG 420GCCGGCCCCA GCCTCGCTTG CTGTCTGCTC GGCCTCCTGG CGCTGACCTC CGCCTGCTAC 480ATCCAGAACT GCCCCCTGGG A GGCAAGAGG GCCGCGCCGG ACCTCGACGT GCGCAAGGTG 540AGTCCCCAGC CCTGGTCCCG CGGCGCTCCG GGGAGGGAGG GACCCGCAGC CACAGGGGCG 600CGCCCCGCTC CGGCCTCGCC TGAGAACTCC AGGAGCTGAG CGGATTTTGA CGCCCCGCCC 660TTGACCGCGG TCGAGGCCCC CACGGCGCCC CAGCGTCTCA GCCCCGCTGT CCCCGCCCGA 720ACTCCGAACC CCGGACCCCA GCATCCTTGC CCGGCGCACC CCGGCCGGCC TCGCAGGGTC 780CTCCGAGCGA GTCCCCAGCG CCGCCCCGCG TCCCGCTCAC CCCGCCCGTC CCCCGAGTGC 840CTCCCCTGCG GCCCCGGGGG CAAAGGCCGC TGCTTCGGGC CCAATATCTG CTGCGCGGAA 900GAGCTGGGCT GCTTCGTGGG CACCGCCGAA GCGCTGCGCT GCCAGGAGGA GAACTACCTG 960CCGTCGCCCT GCCAGTCCGG CCAGAAGGCG TGCGGGAGCG GGGGCCGCTG CGCCTTGGGC 1020CTCTGCTGCA GCCCGGGTGA GCGGGGCAAG GCGCTCCGGG GCCAGGGGGA GGCGGGCGGG 1080GGTGCGGCCG GGATTCCCCT GACTCCACCT CTTCCTCCAG ACGGCTGCCA CGCCGACCCT 1140GCCTGCGACG CGGAAGCCAC CTTCTCCCAG CGCTGAAACT TGATGGCTCC GAACACCCTC 1200GAAGCGCGCC ACTCGCTTCC CCCATAGCCA CCCCAGAAAT GGTGAAAATA AAATAAAGCA 1260GGTTTTTCTC CTCTACCTTG ACTCGTGTCT AAGTGCCAGA AATGGGACGG GGAGGGGGCA 1320TTGTGGGACT GGAAGATC 1338

Die 20 Aminosäuren

unterscheiden sich nur in ihren Seitenketten (funktionelle Gruppen)

different amino acids

• Amino acids have different biochemical and physical propertiesthat influence their relative replaceability in evolution.

CP

GGAVIL

MF

YW H

KR

E Q

DNS

TCSH

S+S

positive

chargedpolar

aliphatic

aromatic

small

tiny

hydrophobic

Unter Abgabe eines Wassermoleküls vereinigen sich die Aminosäuren zu einem Dipeptid. Es entsteht eine sogenannte Peptidbindung zwischen einem C- und einem N-Atom.

Hier sieht man die Peptidbindung in Großaufnahme (blau = Stickstoff, rot = Sauerstoff, schwarz = Kohlenstoff, grau = Wasserstoff, grün = Rest). Die dunkelrot gefärbten Bindungen liegen in einer Ebene und sind recht starr. Ursache hierfür ist die C=O-Doppelbindung. An den anderen Stellen des Peptids herrscht dagegen freie Beweglichkeit. Tripeptide bilden sich, wenn drei Aminosäuren (oder ein Dipeptid und eine Aminosäure) miteinander unter Wasserabspaltung reagieren (man nennt einen solchen Vorgang, bei dem Wasser abgegeben wird, auch Kondensation).

Allgemein bezeichnet man Peptide, die aus wenigen Aminosäuren bestehen, als Oligopeptide. Das Gegenteil sind dann die Polypeptide, die aus vielen Aminosäuren bestehen. Peptide, die aus mehr als 100 Aminosäuren zusammengesetzt sind, bezeichnet man dann als Proteine.

Secondary structure - alpha-Helix

Properties of theαααα-helix.The structure repeats itself every 5.4 Å along the helix axis, i.e. we say that theα-helix has a pitch of 5.4 Å. α-helices have3.6 amino acid residues per turn, i.e. a helix 36 amino acids long would form 10 turns.

Helix-Stukturen

Secondary Structure - ß-Sheet

The ß-sheet structureIn a ß-sheet two or more polypeptide chains run alongside each other and are linked in a regular manner by hydrogen bondsbetween the main chain C=O and N-H groups. Therefore all hydrogen bonds in a ß-sheet are between different segments ofpolypeptide. This contrasts with theα-helix where all hydrogen bonds involve the same element ofsecondary structure.

Secondary Structure - ß-Sheet

Secondary structureReverse turnsA reverse turn is region of the polypeptide having a hydrogen bond from one main chain carbonyl oxygen to the main chainN-H group 3 residues along the chain (i.e. Oi to Ni+3). Helical regions are excluded from this definition and turns betweenß-strands form a special class of turn known as the ß-hairpin.

Tertiary structure

Tertiary structure describes the packing of alpha-helices, beta-sheets and random coils with respect to each other on the level of one whole polypeptide chain. Figure shows the tertiary structure of Chain B of Protein Kinase C Interacting Protein

Quarternary structureQuaternary structure only exists, if there is more than one polypeptide chain present in a complex protein. Then quaternary structure describes the spatial organization of the chains.The figure shows the Protein Kinase C interacting protein.

Zusammenfassung von I)

The wide variety of 3-dimensional protein structures corresponds to thediversity of functions proteins fulfill.

Proteins fold in three dimensions. Protein structure is organizedhierarchically from so-calledprimary structuretoquaternary structure. Higher-level structures aremotifs and domains.

Theprimary structure is the sequence of residuesin the polypedptidechain.

IIAufgaben der Bioinformatik

Structure prediction methods are coarsely divided into three categories:

1. Comparative modellingIf the sequence to model has a homologue in the PDB (Brookhaven protein database)which it is very similar to, the homologue may be used as target and a structural model is built on the basis of this template.

2. Fold recognitionIn absence of a significantly similar sequence with known structure, various methods put together in the term "Fold Recognition".

3. Ab initio predictionIn contrast to the above methods, the goal of ab initio prediction is to build a model for a given sequence without using a template e.g by minimizing knowledgebased energy functions (Potential energy for any protein conformation - Potential energy function (PEF) Secondary Structure Prediction

How can protein structures be predicted

Protein structure database - PDB

Experimental methods given by X-ray crystallography and NMR spectroscopy to determine protein structure are essential. The Brookhaven Protein Data Bank (PDB) is the repository for thosestructures. Files include atom coordinates and are suited for visualization by graphical molecule viewers like rasmol.

Atom coordinates

Sequences (NRL3D)

1.

How are the secondary structures detected

in a PDB file

The figure below shows the three main chain torsion angles of a polypeptide. These are phi (F), psi (Y), and omega (W).

omega fixed because of planar peptide bond.

alpha

beta

Sequence Analysis on the Web

2.

Sequence Databases

SWISS-PROT is a curated protein sequence databasewhich strives to provide a high level of annotations(such as thedescription of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal levelof redundancyand high level of integration with other databases.

TrEMBL is a computer-annotated supplement of SWISS-PROTthatcontains all the translations of EMBL nucleotide sequence entries not yetintegrated in SWISS-PROT.

These databases are developed by the SWISS-PROT groups at SIB and at EBI.

SwissProt:Release 40 and updates up to 15-Nov-2001: 102164 entriesTrEMBL (Nov. 2001): 557388 entries

Homology modellingQuick and easy!!!!Use the SWISS-MODEL server:HTTP://www.expasy.ch/swissmod/SWISS-MODEL.html

SWISS-MODEL is an Automated Protein ModellingServer running at the GlaxoWellcome Experimental Research in Geneva, Switzerland.

DisclaimerThe result of any modelling procedure is NON-EXPERIMENTAL and MUST be considered with care. This is especially true since there is no human intervention during model building.

New 3D modeling Server Geno3d:HTTP://geno3d-pbil.ibcp.fr/

TASK DESIGN

DomainSweep compares a protein sequence with a range of protein family databases.

The output of DomainSweep is comprised of an overview of the different database search results as well as a graphical report on the location of family patterns found in the sequence.

PROBLEM

Determine function for an uncharacterised protein sequence

Protein Domain Databases Evaluation

Each database has different strengths and weaknesse s• PFAM, PRODOM:

– Identification of members of highly divergent superfamilies– but less likely to give specific sub-family diagnoses and – quality is low

• PRINTS, BLOCKS:– give specific sub-family diagnoses– but less coverage

• Pattern part of PROSITE:– good detection of very short motifs– but least coverage and – unreliable in the identification of highly divergent superfamilies

Protein Analysis

Fold classesall alpha

all beta

alpha+beta

Fold class prediction - FoldClass

FoldClass(HUSAR) predicts protein fold classes and protein domainsfrom sequence data.

The predictions are generated byartificial neural networks (Reczko, M. and Bohr, H. Nucl. Ac. Res. 22: 3616-3619 (1994)).

This program predicts:• a specific overall fold-class, • a super fold-class with respect to secondary structurecontent and

spatial distribution • optionally, a profile of possible fold-classes along the sequence.

Fold class prediction - (Gen)Threader

Algorithm: • A library of unique protein domain folds is derived from PDB• Testsequence is optimally fitted to all folds (allowinginsertions/deletions)• Energy of each possible fit is calculated by summing interactions and solvationsparameters• The lowest energy fold is taken • Unlike most threading methods, such as the original THREADER, GenTHREADER attempts to make inferences about possible evolutionary

relationships.

���������

• Number of analysis programs is huge. Which one should beused for what purpose?

• It is difficult to feed results from one program as input intothe next program

• Users need compact presentable reports on analysis results

3.

Energy Minimisation - Start

Calculate potentiell energy for a givenmolecule (atom coordinates):

set of nuclear positions of all atoms = R

Energy Minimisation - Method

We move the molecule so as to reduce itspotential energy.There are several routines to do this:- Steepest Descent- Gradient conjugation- and more

Unfortunately no technique can guarantee to find the global energy minimum of a complexproblem (although simulated annealing ispartial solution).

Modelling Programs

WHATIFINSIGHTIIGAUSSIANSCC-DFTB..

GROMOSDISCOVER..

Model

SWISS-3DIMAGE (References) is an image database whichstrives to provide high quality pictures of biologicalmacromolecules with known three-dimensional structure. The database contains mostly images of experimentally elucidatedstructures, but also provides views of well accepted theoretical protein models. The images are provided in several useful formats; both mono and stereo pictures are generally available (Disclaimer).

Viewer:RasmolKinemageMoldenGaussviewSybylMSViewerInsightWebLabSwiss....

Molecule Simulation - Molecular Dynamics

- Thestarting place for most simulations is theexperimental crystal or NMR structure.

- This is energy minimized, solvated in a box of water.

- System is heated (high energy state)

- Equilibration and simulation for 1 nano seconds, only short timesare possible

The detailed atomic motionsare usually unimportant. What really matters are "theensembleaverage" properties - i.e., what happens onaverage (MD is in fact chaotic with sensitivedependence on initial conditions - like the weather!).

Proteins are not the static structures that X-ray crystallography can suggest, but are continuously moving. This is a short simulation of crambin, calculated using the AMBER force field.

Molecular Dynamics

DNA is not static either. This simulation was calculated using AMBER and a continuum model for water.

MD-Simulation

������������ ������������

Recommended