46
Lecture 5 Protein Modeling June 4, 2008 Protein Structure Prediction

Lecture 5 Protein Modeling - University of …pages.cs.wisc.edu/~bsettles/ibs08/lectures/05-structureprediction.pdf• we know that the function of a protein is determined in ... called

  • Upload
    hathu

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

  • Lecture 5Protein Modeling

    June 4, 2008

    Protein Structure Prediction

  • 3D Protein Structure

    ALA

    C

    LEU

    C

    PRO

    C

    VAL

    C

    ARG

    C

    ? ? ?

    backbone

    sidechain

  • The Protein Folding Problem

    we know that the function of a protein is determined in large part by its 3D shape (fold, conformation)

    can we predict the 3D shape of a protein given only its amino-acid sequence?

  • Motivation Want to identify the function of genes we find,

    and what different mutations/alleles do One gene = one protein (sort of)

    Function of protein = function of gene

    Function can be determined in many ways Gene expression, knockouts, etc But these take time, and are prone to mistakes

    Goal: If we can structure every protein, learning their functions isnt too far away

  • Thornton et al 2000 (Nature)

  • ai.stanford.edu/~serafim/CS262_2006/Slides

  • Protein Architecture proteins are polymers consisting of amino acids linked by

    peptide bonds each amino acid consists of

    a central carbon atom an amino group a carboxyl group a side chain

    differences in side chains distinguish different amino acids

    2NHCOOH

  • 3D Protein Structure

    backbonebackbonesidechainbackbonesidechainC-alpha

  • Peptide Bondsaminogroup

    carboxylgroup

    sidechain

    carbon (common reference point for coordinates of a structure)

  • Amino Acid Side Chains side chains vary in: shape, size, charge, polarity

  • Levels of Description protein structure is often described at four different scales

    primary structure secondary structure tertiary structure quaternary structure

  • Levels of Description

  • Secondary Structure secondary structure refers to certain common repeating

    structures it is a local description of structure two common secondary structures

    helices strands/sheets

    a third category, called coil or loop, refers to everything else

  • Helices

    carbon

    hydrogenbond

    individualamino acid

  • Sheets

  • Ribbon Diagram Showing Secondary Structures

  • Levels of Description

  • What Determines Conformation? in general, the amino-acid sequence of a protein determines

    the 3D shape of a protein [Anfinsen et al., 1950s] but some exceptions

    all proteins can be denatured some proteins are inherently disordered (i.e. lack a regular

    structure) some proteins get folding help from chaperones there are various mechanisms through which the

    conformation of a protein can be changed in vivo post-translational modifications such as

    phosphorylation prions etc.

  • What Determines Conformation?

    what physical properties of the protein determine its fold? rigidity of the protein backbone interactions among amino acids, including

    electrostatic interactions van der Waals forces volume constraints hydrogen, disulfide bonds

    interactions of amino acids with water

  • Determining Protein Structures protein structures can be determined

    experimentally (in many cases) by x-ray crystallography nuclear magnetic resonance (NMR)

  • Myoglobin

    From www.inst.bnl.gov/GasDetectorLab/x-rays/SRI94.htm

  • Myoglobin

    S.E.V. Phillips. "Structure and refinement of oxymyoglobin at 1.6 resolution.", J. Mol. Biol. 1980, 142, 531.

  • X-ray Crystallography

    proteincrystal collection

    plate

    x-raybeam

    diffractionpattern

    electrondensity map

    (3D picture)

  • Electron Density Map Interpretation

    GIVEN: 3D Electron Density Map

  • Electron Density Map Interpretation

    FIND: All-atom Protein Model

  • NMR

    Nuclear Magnetic Resonance Spectroscopy Cannot handle large proteins like X-ray Exploits the chemical environment to return

    distances between atoms Can use knowledge of restraints to identify

    positions of atoms that produce peaks

  • Protein structure determination in solution by NMR spectroscopy Wuthrich K. J Biol Chem. 1990 December 25;265(36):22059-62

  • Experimental Methods

    Very expensive and time-consuming Computational methods can help with time

    Many proteins still cannot be done in this manner

  • More motivation

    there is a large sequence-structure gap300K protein sequences in SwissProt

    database50K protein structures in PDB database

    key question: can we predict structures by computational means instead?

  • Approaches to Protein Structure Prediction

    prediction in 1D secondary structure solvent accessibility (which residues are exposed to

    water, which are buried) transmembrane helices (which residues span

    membranes) prediction in 2D

    inter-residue/strand contacts prediction in 3D

    homology modeling fold recognition (e.g. via threading) ab initio prediction (e.g. via molecular dynamics)

  • Prediction in 1D, 2D and 3D

    Figure from B. Rost, Protein Structure in 1D, 2D, and 3D, The Encyclopaedia of Computational Chemistry, 1998

    predicted secondary structure and solvent accessibility

    known secondary structure (E = beta strand) and solvent accessibility

  • 2D Prediction Approaches use secondary structure predictions

    to predict short-range contacts (e.g. hydrogen bonds in helices)

    use secondary structure predictions to predict strand alignments

    use correlated mutations to predict contacts

  • Prediction in 3D homology modeling

    given: a query sequence Q, a database of protein structuresdo:

    find protein P has high sequence similarity to Q return Ps structure as an approximation to Qs structure

    fold recognition (threading)given: a query sequence Q, a database of known foldsdo:

    find fold F such that Q can be aligned with F in a highly compatible manner

    return F as an approximation to Qs structure

  • Prediction in 3D fragment assembly(Rosetta)

    given: a query sequence Q, a database of structure fragmentsdo:

    find a set of fragments that Q can be aligned with in a highly compatible manner

    return the combined fragments as an approximation

    molecular dynamicsgiven: a query sequence Qdo:

    use laws of physics to to simulate folding of Q

  • Homology Modeling

    0% 100%30%pairwise sequence identity

    homologsprobablyunrelated

    remotehomologs

    20%

    most pairs of proteins with similar structure are remote homologs (< 25% sequence identity)

    homology modeling usually doesnt work for remote homologs ; most pairs of proteins with < 25% sequence identity are unrelated

  • Homology-based Prediction

    Raw model

    Loop modeling

    Side chain placement

    Refinement

  • The SCOP DatabaseStructural Classification Of Proteins

    FAMILY: proteins that are >30% similar, or >15% similar and have similar known structure/function

    SUPERFAMILY: proteins whose families have some sequence and function/structure similarity suggesting a common evolutionary origin

    COMMON FOLD: superfamilies that have same secondary structures in same arrangement, probably resulting by physics and chemistry

    ai.stanford.edu/~serafim/CS262_2006/Slides

  • Examples of Fold Classesai.stanford.edu/~serafim/CS262_2006/Slides

  • Threading

  • Ab initio Prediction ROSETTA

    1. PSI-BLAST homology search

    Discard sequences with >25% homology

    2. PHD

    For each 3-long and each 9-long sequence fragment, get 25 structure fragments that match well

    3 M k Ch i M C l

    ?? ?

    ai.stanford.edu/~serafim/CS262_2006/Slides

  • ai.stanford.edu/~serafim/CS262_2006/Slides

  • Ab initio Prediction CASP results

    ai.stanford.edu/~serafim/CS262_2006/Slides

  • Summary of current state of the art

    ai.stanford.edu/~serafim/CS262_2006/Slides

  • Open Ended

    Ab Initio is the goal, far from it Sidechain prediction Contact Map prediction Search space reduction Parallelization (GPUs) Surface Accessibility

  • Other areas

    Protein-Protein Interaction Drug Design Protein Engineering Ligand Docking/Inhibition Function Prediction

    Lecture 5Protein ModelingJune 4, 20083D Protein StructureThe Protein Folding ProblemMotivationSlide Number 5Slide Number 6Protein Architecture3D Protein StructurePeptide BondsAmino Acid Side ChainsLevels of DescriptionLevels of DescriptionSecondary Structurea Helicesb SheetsRibbon Diagram Showing Secondary StructuresLevels of DescriptionWhat Determines Conformation?What Determines Conformation?Determining Protein StructuresSlide Number 21Slide Number 22X-ray CrystallographyElectron Density Map InterpretationElectron Density Map InterpretationNMRSlide Number 27Experimental MethodsMore motivationApproaches to Protein Structure PredictionPrediction in 1D, 2D and 3D2D Prediction ApproachesPrediction in 3DPrediction in 3DSlide Number 35Homology ModelingHomology-based PredictionThe SCOP DatabaseExamples of Fold ClassesThreadingAb initio Prediction ROSETTA Slide Number 42Ab initio Prediction CASP resultsSummary of current state of the artOpen EndedOther areas