51
Basic bioinformatics tools for studying proteins Dong Xu Computer Science Department C. S. Bond Life Sciences Center University of Missouri, Columbia http://digbio.missouri.edu

Basic bioinformatics tools for studying proteins

  • Upload
    rane

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Basic bioinformatics tools for studying proteins. Dong Xu Computer Science Department C. S. Bond Life Sciences Center University of Missouri, Columbia http://digbio.missouri.edu. Introduction. Broaden knowledge for undergraduate education - PowerPoint PPT Presentation

Citation preview

Page 1: Basic bioinformatics tools for studying proteins

Basic bioinformatics tools for studying proteins

Dong Xu

Computer Science Department C. S. Bond Life Sciences CenterUniversity of Missouri, Columbia

http://digbio.missouri.edu

Page 2: Basic bioinformatics tools for studying proteins

Introduction

Broaden knowledge for undergraduate education

Many opportunities for biomedical and agricultural related jobs

Practice basic protein tools:Useful for biological studiesIntellectually stimulating

Dong’s picks for beginners :Not unnecessarily the most accurate toolEasy to use and understandVery popular

Page 3: Basic bioinformatics tools for studying proteins

Proteins – Some Basics

What Is a Protein?Linear Sequence of Amino Acids...

What is an Amino Acid?

Page 4: Basic bioinformatics tools for studying proteins

20 20 Amino acidsAmino acids

Glycine (G)

Glutamic acid (E)

Asparatic acid (D)

Methionine (M)

Threonine (T)

Serine (S)

Glutamine (Q)

Asparagine (N)

Tryptophan (W)

Phenylalanine (F)

Cysteine (C)

Proline (P)

Leucine (L)

Isoleucine (I)

Valine (V)

Alanine (A)

Histidine (H)

Lysine (K)

Tyrosine (Y)

Arginine (R)

White: Hydrophobic, Green: Hydrophilic, Red: Acidic, Blue: Basic

Page 5: Basic bioinformatics tools for studying proteins

Amino Acids connect via PEPTIDE BOND

Peptide Bond

A AFNG

GS T

SD

K

Page 6: Basic bioinformatics tools for studying proteins

An Overview

o A protein folds into a unique 3D structure under the physiological condition

Lysozyme sequence (129 amino acids):KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS

TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS

DGNGMNAWVA WRNRCKGTDV QAWIRGCRL

Protein backbones: Side chain

Page 7: Basic bioinformatics tools for studying proteins

Primary, Secondary and Tertiary Structures of

Proteins

Page 8: Basic bioinformatics tools for studying proteins

Protein Structure Representations

Lysozyme structure:

ball & stick strand surface

Page 9: Basic bioinformatics tools for studying proteins

Structure Visualization

Rasmol (http://www.umass.edu/microbio/rasmol/getras.htm)

MDL Chime (plug-in) (http://www.mdl.com/products/framework/chime/)

Protein Explorer (http://molvis.sdsc.edu/protexpl/frntdoor.htm)

Jmol: http://jmol.sourceforge.net/ Pymol: http://pymol.sourceforge.net/ Vmd: http://www.ks.uiuc.edu/Research/vmd/

Page 10: Basic bioinformatics tools for studying proteins

Sequence Homology Software

NCBI-BLASThttp://www.ncbi.nlm.nih.gov/BLAST/

Comparing 2 (pairwise) or more (multiple) sequences.

Searching for a series of identical or similar characters in the sequences.

VLSPADKTNVKAAWAKVGAHAAGHG||| | | |||| | ||||VLSEAEWQLVLHVWAKVEADVAGHG

Page 11: Basic bioinformatics tools for studying proteins

Typical BLAST Output

Page 12: Basic bioinformatics tools for studying proteins

InterPro Scanhttp://www.ebi.ac.uk/InterProScan/

Page 13: Basic bioinformatics tools for studying proteins

InterPro Scan PCNA http://www.ebi.ac.uk/InterProScan/

Page 14: Basic bioinformatics tools for studying proteins

MyHits Local Motifs Searchhttp://myhits.isb-sib.ch/

Page 15: Basic bioinformatics tools for studying proteins

MyHits Local Motifs Summaryhttp://myhits.isb-sib.ch/

Page 16: Basic bioinformatics tools for studying proteins

MyHits Local Motif Hitshttp://myhits.isb-sib.ch/

Page 17: Basic bioinformatics tools for studying proteins

Multiple Alignment

VTISCTGSESNIGAG-NHVKWYQQLPGVTISCTGTESNIGS--ITVNWYQQLPGLRLSCSSSDFIFSS--YAMYWVRQAPGLSLTCTVSETSFDD--YYSTWVRQPPGPEVTCVVVDVSHEDPQVKFNWYVDG--ATLVCLISDFYPGA--VTVAWKADS--AALGCLVKDYFPEP--VTVSWNSG---VSLTCLVKEFYPSD--IAVEWWSNG--

Page 18: Basic bioinformatics tools for studying proteins

Phylogeny Tree

Multiple protein sequence alignment

conserved sites and hence possibly functional sites

phylogenetic tree

Page 19: Basic bioinformatics tools for studying proteins

MSA with ClustalW

1exr_A -EQLTEEQIAEFKEAFALFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGN 59 1N0Y_A AEQLTEEQIAEFKEAFALFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGN 60 3cln_ ----TEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGN 56 :************:******************************************* 1exr_A GTIDFPEFLSLMARKMKEQDSEEELIEAFKVFDRDGNGLISAAELRHVMTNLGEKLTDDE 119 1N0Y_A GTIDFPEFLSLMARKMKEQDSEEELIEAFKVFDRDGNGLISAAELRHVMTNLGEKLTDDE 120 3cln_ GTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEE 116 *********::******: *****: ***:***:**** *******************:* 1exr_A VDEMIREADIDGDGHINYEEFVRMMVS- 146 1N0Y_A VDEMIREADIDGDGHINYEEFVRMMVSK 148 3cln_ VDEMIREANIDGDGQVNYEEFVQMMTA- 143 ********:*****::******:**.:

2 or more sequences for

analysis

params (default or custom for

different scoring

matrices, gap penalties, etc.)

ClustalW

Phylogram

Cladogram

ClustalW: http://www.ebi.ac.uk/Tools/clustalw2/index.html

Page 20: Basic bioinformatics tools for studying proteins

Cell localization

Page 21: Basic bioinformatics tools for studying proteins

Typical Sorting Signals

Signal Function Example

Import into nucleus -P-P-K-K-K-R-K-V-

Export from nucleus -L-A-L-K-L-A-G-L-D-I-

Import into mitochondria <-MLSLRQSIRFFKPATRTLCSSRYLL-

Import into plastid <-MVAMAMASLQSSMSSLSLSSNS

FLGQPLSPITLSPFLQG-

Import into peroxisomes -S-K-L->

Import into ER <-MMSFVSLLLVGILFWAT

EAEQLTKCEVFN-

Return to ER -K-D-E-L->

Page 22: Basic bioinformatics tools for studying proteins

Localizations

Cell localization

PSORT: http://psort.nibb.ac.jp/

TargetP:

http://www.cbs.dtu.dk/services/TargetP/

Signal peptide

SingalP:

http://www.cbs.dtu.dk/services/SignalP/

Page 23: Basic bioinformatics tools for studying proteins

SignalP result

Page 24: Basic bioinformatics tools for studying proteins

Membrane Bilayer with Proteins

Page 25: Basic bioinformatics tools for studying proteins

Helix Bundle TM Proteins

PDB = 1QHJ PDB = 1RRC

Single helix or helical bundles (> 90% of TM proteins)Examples: Human growth hormone receptor, Insulin receptor

ATP binding cassette family - CFTRMultidrug resistance proteins

7TM receptors - G protein-linked receptors

Page 26: Basic bioinformatics tools for studying proteins

Beta Barrel TM Proteins

Page 27: Basic bioinformatics tools for studying proteins

Transmembrane Prediction

http://bp.nuap.nagoya-u.ac.jp/sosui/ (alpha)

http://psfs.cbrc.jp/tmbeta-net/ (beta)

Page 28: Basic bioinformatics tools for studying proteins

Secondary Structure Prediction

SSpro 4.1: http://sysbio.rnet.missouri.edu/multicom_toolbox/

PSI-PRED: http://bioinf.cs.ucl.ac.uk/psipred/psiform.html

SAM: http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html

PHD: http://www.predictprotein.org/

Page 29: Basic bioinformatics tools for studying proteins

Coiled coil prediction

http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_lupas.html

Page 30: Basic bioinformatics tools for studying proteins

Special motif prediction

Helix-turn-helix motif predictionhttp://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_hth.html

Kinase related motifshttp://scansite.mit.edu/motifscan_seq.phtml

Leucine Zippershttp://2zip.molgen.mpg.de/index.html

Page 31: Basic bioinformatics tools for studying proteins

Protein disorder prediction

PreDisorder: http://sysbio.rnet.missouri.edu/multicom_toolbox/

A collection of disorder predictors:http://www.disprot.org/predictors.php

Page 32: Basic bioinformatics tools for studying proteins

2D: Contact Map Prediction

1 2 ………..………..…j...…………………..…n 123....i.......n

3D Structure 2D Contact Map

Distance Threshold = 8Ao

Page 33: Basic bioinformatics tools for studying proteins

Contact Prediction

SVMcon: http://casp.rnet.missouri.edu/svmcon.html NNcon:

http://casp.rnet.missouri.edu/nncon.html SCRATCH: http://scratch.proteomics.ics.uci.edu/ SAM:

http://compbio.soe.ucsc.edu/HMM-apps/HMM-applications.html

Page 34: Basic bioinformatics tools for studying proteins

Structure Comparison

Visualize structure alignment using VAST:

http://www.ncbi.nlm.nih.gov/Structure/

Two ferredoxins, 1DOI and

1AWD, are aligned structurally,

showing an insertion in 1DOI

that contains potassium-ion

binding sites. This may be the

result of adaptations to the high

salt environment of the Dead Sea.

Page 35: Basic bioinformatics tools for studying proteins

Structure Alignment Tools

CE (http://cl.sdsc.edu/) DALI

(http://www.ebi.ac.uk/dali/)

TM-Align: http://zhang.bioinformatics.ku.edu/TM-align/

Page 36: Basic bioinformatics tools for studying proteins

Structure-Based Search

Comparing a query protein structure against

all the structures in the PDB

The DALI server:

http://www2.ebi.ac.uk/dali/

When new structures are solved, researchers often submit them to the DALI server to find structural neighbors and their alignments.

Page 37: Basic bioinformatics tools for studying proteins

Swiss Model: Comparative Modeling Serverhttp://swissmodel.expasy.org/

Page 38: Basic bioinformatics tools for studying proteins
Page 39: Basic bioinformatics tools for studying proteins

Protein Structure Homology Modeling: Modeller

Page 40: Basic bioinformatics tools for studying proteins

Analysis software

PROCHECK WHATCHECK Suite Biotech PROSA

Page 41: Basic bioinformatics tools for studying proteins
Page 42: Basic bioinformatics tools for studying proteins
Page 43: Basic bioinformatics tools for studying proteins
Page 44: Basic bioinformatics tools for studying proteins

Entrez Databaseshttp://www.ncbi.nlm.nih.gov/Entrez/

Page 45: Basic bioinformatics tools for studying proteins

Design Program

DEZYMER (Hellinga)Given a ligand and a protein with known structure,

suggest residues to be mutated so that the resulting protein binds the ligand.

ORBIT (Mayo)Given a backbone structure, design a sequence such

that it folds to that backbone.

Rosetta (Baker)One program to treat diverse problems

Prediction and design

Page 46: Basic bioinformatics tools for studying proteins

DEZYMER

1. Define the expected binding geometry

2. Find backbone places where if appropriate side chains are added, the predefined geometry is satisfied

3. Place the side chains and ligand, and optimize there position

4. Repack residues in positions other than binding residues. If necessary, change residue type

Hellinga and Richards, JMB, 1991. Construction of new ligand binding sites in protein of known structure

Page 47: Basic bioinformatics tools for studying proteins

ORBIT

Comparison between the designed backbone (averaged NMR structure, blue) and the target backbone (red)

Solution structure of the designed protein. Stereoview showing the best-fit superposition of the 41

1. Divide the target structure into three parts: core, surface and boundary

2. Core: Ala, Val, Leu, Ile, Phe, Tyr, Trp Surface: Ala, Ser, Thr, His, Asp, Asn, Glu, Gln, Lys, and Arg Boundary: union of the above two

3. 1.9*1027 possible sequence

4. Select best sequence efficiently, using dead end elimination (DDE)

Page 48: Basic bioinformatics tools for studying proteins

Calciomics

Calciomics is a specialized area of biochemistry focusing on the study of calcium-binding biological macromolecules and proteins to understand the factors that contribute to calcium-binding affinity and the selectivity of proteins and calcium-dependent conformational change.

http://lithium.gsu.edu/faculty/Yang/Calciomics.htm

Page 49: Basic bioinformatics tools for studying proteins

SOSUIRemove transmembrane

regions

SignalPRemove signal region

ProDom

Modifiedsequences

PROSPECT

Originalsequence

Set of domainsequences

Coiled coilsRemove disorder

regions

SSPSecondary Structure

prediction

PSI-BLAST

Iterations:Analysis of E-value,

set of profile sequences

STOPif homolog

found in PDB

3D model

Function annotation

SWISS-PROTannotation

PFAMFamily classification

MotifActive sites

PSORTSubcellular location

Enzyme structure DB

MedlineLiterature search

WHATIF /PROCHECK

Evaluate & adjust alignments

MODELLER/ Jackle

seq

uen

ce

anal

ysis

and

pro

cess

ing

stru

ctu

re p

red

icti

on a

nd

eva

luat

ion

fun

ctio

n in

fere

nce

tool

kit

Page 50: Basic bioinformatics tools for studying proteins

Summary

Practice 10 selected tools Help answer the question: what does

this protein do? Collaborate with experimentalists Find more tools at

http://us.expasy.org/tools/http://infosuite.welch.jhmi.edu/BS/pt

Page 51: Basic bioinformatics tools for studying proteins

Acknowledgments

This file is for the educational purpose only. Some materials (including pictures and text) were taken from the Internet at the public domain.