Structural Bioinfo

Embed Size (px)

Citation preview

  • 8/6/2019 Structural Bioinfo

    1/76

    STRUCTURAL BIOINFORMATICS( Toward A High-Resolution Understanding of Biology )

  • 8/6/2019 Structural Bioinfo

    2/76

    Objectives of Lecture

    Structural Bioinformatics What is 3D Structure Prediction Significance of 3D Structure Prediction Central Dogma Fundamentals of Protein StructureProtein Data bank (PDB) To be aware of a number of Structure Prediction methods: Homology Modeling Fold Recognition/ThreadingAb initio Protein Folding ApproachesApplications of Structural BioinformaticsAnalog-Based design Structure-Based design

  • 8/6/2019 Structural Bioinfo

    3/76

    Structural BioinformaticsStructural Bioinformatics

    Structural Bioinformatics is a subset of Bioinformatics

    concerned with the use of biological structures-Protein, DNA, RNA, Ligands and complexes thereofto further our understanding of biological systems.

  • 8/6/2019 Structural Bioinfo

    4/76

    What is protein structure prediction?

    A prediction of the (relative) spatial position of each

    atom in the tertiary structure generated from

    knowledge only of the primary structure

    (sequence).

  • 8/6/2019 Structural Bioinfo

    5/76

    Significance of Protein Structure

    Prediction In evolutionary related proteins structure is much better

    preserved than sequence.

    3D protein structure offers much more information then justthe amino acid sequence. By comparison with known structures we can infer probable

    biological functions of new proteins By mapping the residue conservations on to the structure we

    can infer active sites and possibly the molecular function

  • 8/6/2019 Structural Bioinfo

    6/76

    We can also identify regions involved in protein-proteininteractions.

    We can reconstruct (at least partially) the structure of protein complexes identified by other experimental methods.

    We can build homology models.

  • 8/6/2019 Structural Bioinfo

    7/76

    The central dogma

    DNA ------- RNA ---------- Protein{A,C,T,G} {A,C,G,U} {A,D,..Y}Guanine, Cytosine TU

    Thymine, Adenine

  • 8/6/2019 Structural Bioinfo

    8/76

    Fundamentals of Protein Structure

  • 8/6/2019 Structural Bioinfo

    9/76

    Terminology

    Primary Structure-- The sequence of amino acidresidues in the proteins.

    --MESSTHEDRKVLDL

  • 8/6/2019 Structural Bioinfo

    10/76

    Amino acids and the peptidebond

    C first side chain carbon (except for glycine).

    C atoms

  • 8/6/2019 Structural Bioinfo

    11/76

    Secondary Structure

    A first level description of 3D structure. The peptide backbone of DNA has areas of positive charge

    and negative charge These areas can interact with one another to form hydrogenbonds

    The result of these hydrogen bonds are two types ofstructures:

    alpha helices beta pleated sheets

  • 8/6/2019 Structural Bioinfo

    12/76

    Secondary Structure I: TheE-

    Helix

  • 8/6/2019 Structural Bioinfo

    13/76

    Several beta-strands assembleinto abeta-sheet (a tertiarystructural element)

    Secondary Structure II: The -

    Strand

    (About 3.4)

  • 8/6/2019 Structural Bioinfo

    14/76

    Antiparallel -Sheets

  • 8/6/2019 Structural Bioinfo

    15/76

    Parallel -Sheets

  • 8/6/2019 Structural Bioinfo

    16/76

    Mixed -Sheets

  • 8/6/2019 Structural Bioinfo

    17/76

    Tertiary Structure: The Global Three

    Dimensional Structure

    Secondary structure elements pack together to form astructural core

    Tertiary structure results from the folding of alpha helicesand beta pleated sheets

    Factors influencing tertiary structure include: Hydrophobic/hydrophilic interactions Hydrogen bonding Disulfide linkages Folding by chaperone proteins

  • 8/6/2019 Structural Bioinfo

    18/76

    Tertiary Structure: Different Representations

  • 8/6/2019 Structural Bioinfo

    19/76

    (Richardson-style)Ribbon Diagrams

    are tracesoftheprotein

    backbone

    emphasizing the 3-D arrangementofa-helices and b-strands.

    This arrangement is called

    the proteinfold or the proteinfolding topology.

  • 8/6/2019 Structural Bioinfo

    20/76

    This is much rather likewhat other molecules see when theyencounter a protein!

    This is a representation ofthe molecularsurface (Van der Waals

    surface) of a hemagglutinin domain withbound sialic acid.

    Tertiary Structure: Different Representations

  • 8/6/2019 Structural Bioinfo

    21/76

    Supersecondary Structures: Between

    Secondaryand Tertiary Structure

    For example:- alpha- -above

    - -hairpin - left

  • 8/6/2019 Structural Bioinfo

    22/76

    Quaternary Structure

    Association of Multiple Polypeptide Chains. Quaternary structure results from the interaction of

    independent polypeptide chains

    Factors influencing quaternary structure include: Hydrophobic/hydrophilic interactions Hydrogen bonding The shape and charge distribution on associating

    polypeptides

  • 8/6/2019 Structural Bioinfo

    23/76

  • 8/6/2019 Structural Bioinfo

    24/76

  • 8/6/2019 Structural Bioinfo

    25/76

    Side Chain Properties

    Hydrophobic amino acids stay inside of a protein.

    Hydrophilic ones tend to stay in the exterior of aprotein.

    Oppositely charged amino acids can form salt

    bridge.

    Polar amino acids can participate hydrogen bonding.

  • 8/6/2019 Structural Bioinfo

    26/76

    Domain, Motif, Fold

    Domain: a discrete portion of a protein assumed to foldindependently of the rest of the protein and possessing its

    own function. Most proteins have multiple domains.The overall shape of a domain is called a fold. There are onlya few thousand possible folds.Super-secondary structure, motif

    Frequently occurring structure patterns among multipleproteins, which are not necessarily have similar folds.

  • 8/6/2019 Structural Bioinfo

    27/76

    Determination of protein

    structures

    X-ray Crystallography

    NMR (Nuclear Magnetic Resonance)

    EM (Electron microscopy)

  • 8/6/2019 Structural Bioinfo

    28/76

    A repository for 3-D biological macromolecular structure. Established in 1971 at Brookhaven National Lab (7structures) It includes proteins, nucleic acids and viruses. Obtained by X-Ray crystallography (80%) or NMRspectroscopy (16%). Submitted by biologists and biochemists from around theworld.

    Other sites:MMDB (EBI): msd.ebi.ac.ukNCBI: www.ncbi.nlm.nih.gov/Structure/

    Protein Data bank (PDB)

  • 8/6/2019 Structural Bioinfo

    29/76

    Growth ofProtein Data Bank (PDB): The Motivation

    The number of unique folds in nature is fairly small(possibly a few thousands)

    90% of new structures submitted to PDB in the past three

    years have similar structural folds in PDB

    New fold

    Old fold

  • 8/6/2019 Structural Bioinfo

    30/76

  • 8/6/2019 Structural Bioinfo

    31/76

  • 8/6/2019 Structural Bioinfo

    32/76

  • 8/6/2019 Structural Bioinfo

    33/76

  • 8/6/2019 Structural Bioinfo

    34/76

  • 8/6/2019 Structural Bioinfo

    35/76

    Protein Structure Prediction

    Methods

    Comparative Modeling Method:

    Homology Modeling Method

    Threading Method

    Ab initio folding Method

  • 8/6/2019 Structural Bioinfo

    36/76

    Experimental

    Sequence

    Database

    Searching

    Abinitiomethod

    Structure

    Homolog?

    NO

    YES

    Homology

    ModelingProtein Threading

    Protein structure prediction flowchart

    HomologyModeling

  • 8/6/2019 Structural Bioinfo

    37/76

    Homology Modeling

    Predicts the three-dimensional structure of a given proteinsequence (TARGET) based on an alignment to one or moreknown protein structures (TEMPLATES)

    If similarity between the TARGET sequence and theTEMPLATE sequence is detected, structural similarity can beassumed.

    In general, 30% sequence identity is required for generating useful models.

  • 8/6/2019 Structural Bioinfo

    38/76

    7 Steps In Homology Modeling

  • 8/6/2019 Structural Bioinfo

    39/76

    Step 1: ID HomologuesinPDB

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFGHKLMCNASQERWWPRETWQLKHGFDSADAMNCVCNQWER

    GFDHSDASFWERQWK

    Query Sequence PDB

  • 8/6/2019 Structural Bioinfo

    40/76

    Step 1: ID HomologuesinPDB

    PRTE

    INSEQE

    NCE

    PRTE

    INSEQ

    UE

    NCEPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFGHKLMCNASQERWW

    PRETWQLKHGFDSADAMNCVCNQWER

    GFDHSDASFWERQWK

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFGHKLMCNASQERWW

    PRETWQLKHGFDSADAMNCVCNQWERGFDHSDASFWERQWK

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFG

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQQWEWEWQWEWEQWEW

    EWQRYEYEWQWNCEQWERYTRASDF

    HG

    TREWQIYPASDWERWEREWRFDSFG

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFGPRTEINSEQENCEPRTEINSEQ

    UE

    NCE

    PRTE

    INSEQ

    NCEQWE

    RYTRASDFHGTREWQIYPASDFG

    TREWQIYPASDFGPRTEINSEQENCEPRTEINSEQUENCEPRTEINSEQNCEQWERYTRASDFHGTREWQ

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFGHKLMCNASQERWW

    PRETWQLKHGFDSADAMNCVCNQWERGFDHSDASFWERQWK

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFGHKLMCNASQERWW

    PRETWQLKHGFDSADAMNCVCNQWERGFDHSDASFWERQWK

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFG

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFGPRTEINSEQENC

    Hit#1

    Hit#2

    Query sequencePDB

  • 8/6/2019 Structural Bioinfo

    41/76

    Step 2: Align Sequences

    G E N E T I C S

    G 10 0 0 0 0 0 0 0

    E 0 10 0 10 0 0 0 0

    N 0 0 10 0 0 0 0 0

    E 0 0 0 10 0 0 0 0

    S 0 0 0 0 0 0 0 10

    I 0 0 0 0 0 10 0 0

    S 0 0 0 0 0 0 0 10

    G E N E T I C S

    G 10 0 0 0 0 0 0 0

    E 0 10 0 10 0 0 0 0

    N 0 0 10 0 0 0 0 0

    E 0 0 0 10 0 0 0 0

    S 0 0 0 0 0 0 0 10

    I 0 0 0 0 0 10 0 0

    S 0 0 0 0 0 0 0 10

    G E N E T I C SGE

    NESI

    S

    60 40 30 20 20 0 10 040

    302020100

    50

    302020100

    30

    402020100

    30

    203020100

    20202020100

    0

    0100200

    10101010100

    00010010

    DynamicProgramming

  • 8/6/2019 Structural Bioinfo

    42/76

    Alignment

    Key step in Homology Modeling.

    Global (Needleman-Wunsch) alignment is absolutely

    required.

    Small error in alignment can lead to big error instructural model.

    Multiple alignments are usually betterthan pairwise

    alignments.

    Alignment is prepared by superimposing all template

    structures.

  • 8/6/2019 Structural Bioinfo

    43/76

    Two zonesofsequencealignment

  • 8/6/2019 Structural Bioinfo

    44/76

    Step 3: Find SCRs

    Query

    Hit #1

    Hit #2

    ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEG

    ASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEA

    MCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA

    SCR#1 SCR#2

  • 8/6/2019 Structural Bioinfo

    45/76

    Structurally Conserved regions (SCRs)

    Corresponds to the most stable structures orregions (usually interior) of protein.

    Corresponds to sequence regions with lowestlevel of gapping, highest level of sequenceconservation.

    Usually corresponds to secondary structures.

  • 8/6/2019 Structural Bioinfo

    46/76

    Step 4: Find SVRs

    ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEG

    ASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEA

    MCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA

    HHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCBBBBBBBBB

    Query

    Hit #1Hit #2

    SVR Loop

  • 8/6/2019 Structural Bioinfo

    47/76

  • 8/6/2019 Structural Bioinfo

    48/76

    Step 5: Side Chain Modeling

    Rotamer placement and positioning is done via a

    superposition algorithm using rotamers.

  • 8/6/2019 Structural Bioinfo

    49/76

    Step 6: Model Optimization

    Efficient way of polishing and shining your protein

    model

    Removes atomic overlaps and unnatural strains in the

    structure

    Stabilizes or reinforces strong hydrogen bonds, breaksweak ones

    Brings protein to lowest energy in about 1-2 minutes

    CPU time

    Several freeware options to choose XPLOR (Axel Brunger,Yale)

    GROMACS (Gronnigen, The Netherlands)

    AMBER (Peter Kollman, UCSF)

    CHARMM (Martin Karplus, Harvard)

    TINKER (Jay Ponder, Wash U))

  • 8/6/2019 Structural Bioinfo

    50/76

    Step 7: Model Validation

    PROCHECK -http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html

    PROSA II -http://lore.came.sbg.ac.at/People/mo/Prosa/prosa.html

    VADAR -http://www.pence.ualberta.ca/ftp/vadar/

    DSSP -http://www.embl-heidelberg.de/dssp/

  • 8/6/2019 Structural Bioinfo

    51/76

    Homology Modeling On Web

    http://www.expasy.ch/swissmod/SW ISS-MODEL.html

  • 8/6/2019 Structural Bioinfo

    52/76

    http://www.cmbi.kun.nl:1100/W IWWWI/

  • 8/6/2019 Structural Bioinfo

    53/76

    http://cl.sdsc.edu/hm.html

  • 8/6/2019 Structural Bioinfo

    54/76

    Raw Sequence

    Predicted structure

    Use templates to buildthe structure of the homologous

    sequence

  • 8/6/2019 Structural Bioinfo

    55/76

    MQQPMNYPCP QIFWVDSSAT SSWAPPGSVF PCPSCGPRGP DQRRPPPPPPPVSPLPPPSQPLPLPPLTPL KKKDHNTNLW LPVVFFMVLV ALVGMGLGMY QLFHLQKELA

    ELREFTNQSLKVSSFEKQIA NPSTPSEKKE PRSVAHLTGN PHSRSIPLEW EDTYGTALISGVKYKKGGLVINETGLYFVY SKVYFRGQSC NNQPLNHKVY MRNSKYPEDL VLMEEKRLNYCTTGQIWAHSSYLGAVFNLT SADHLYVNIS QLSLINFEES KTFFGLYKL

    Use of SwissPDB Viewer to build the structure offollowing sequence

  • 8/6/2019 Structural Bioinfo

    56/76

  • 8/6/2019 Structural Bioinfo

    57/76

  • 8/6/2019 Structural Bioinfo

    58/76

  • 8/6/2019 Structural Bioinfo

    59/76

  • 8/6/2019 Structural Bioinfo

    60/76

  • 8/6/2019 Structural Bioinfo

    61/76

  • 8/6/2019 Structural Bioinfo

    62/76

    1TNRADOGB

  • 8/6/2019 Structural Bioinfo

    63/76

  • 8/6/2019 Structural Bioinfo

    64/76

    After magic fit

  • 8/6/2019 Structural Bioinfo

    65/76

    Activate the raw sequence

  • 8/6/2019 Structural Bioinfo

    66/76

  • 8/6/2019 Structural Bioinfo

    67/76

    The Preliminary Result

  • 8/6/2019 Structural Bioinfo

    68/76

    Protein Threading

    Makes structure prediction through identification of good sequence-structure fit.

    Protein threading can predict only the backbone structure of a protein (side-chainshave to be predicted using other methods)

    Predicted Actual

  • 8/6/2019 Structural Bioinfo

    69/76

    Ab Initio 3D structure prediction

    Aims to predict tertiary structure from basic physico-chemicalproperties.

    It is used when Homology Modeling & Threading have failed(no homologies are evident ).

    Does not rely on any detection of similarity to sequence ofknown structure.

    As yet very unreliable for practical predictions.

  • 8/6/2019 Structural Bioinfo

    70/76

  • 8/6/2019 Structural Bioinfo

    71/76

    Analog Based Design

    The analog based approach mainly uses

    Pharmacophoric maps and Quantitative structure

    Activity Relationship (QSAR) to identify or modify alead in the absence of a known 3D structure of the

    receptor.

  • 8/6/2019 Structural Bioinfo

    72/76

    Structure-Based Design

    Structure-based approach starts with thestructure of the receptor site, such as the

    active site in protein.

    Docking comes under this category of design.

  • 8/6/2019 Structural Bioinfo

    73/76

    Quantitative Structure Activity relationship(QSAR)

    QSAR is an applied series of mathematical models built to predict biologicaland physicochemical behavior of molecules based on their chemicalstructures.

    It alleviates the need to determine molecular activity of hundreds of similarcompounds that would take large amounts of resources to determineindividually.

    The underlying premise of QSAR is that Biological Activity is correlated to its

    physiochemicalparameters.

    BA = f (biological + Chemical + Physical)

    Biological activity can be any measured such as IC50, orED50.

  • 8/6/2019 Structural Bioinfo

    74/76

    QSAR Table

    Structure Bioproperty Structural properties

    Comp.1 Bio1 P1 P2 P3 P4

    Comp.2 Bio2 " " " "

    Comp.3 Bio3 " " " "

    Comp.4 Bio3 " " " "

    BA = k1P1 + k2P2 + k3P3 + ...

  • 8/6/2019 Structural Bioinfo

    75/76

    EXTERNAL VALIDATION OF QSARMODELS

    Entire dataset

    Test setTraining set

    Model development (q2) Prediction of thetest set (R2)

  • 8/6/2019 Structural Bioinfo

    76/76

    Thank You