Upload
ince
View
33
Download
2
Embed Size (px)
DESCRIPTION
Allele Mining: with respect to Comparative Protein Structure Modelling and Docking study Sunil Kumar Institute of Life Sciences Bhubaneswar E-mail: [email protected]. Allele Mining: an Introduction. - PowerPoint PPT Presentation
Citation preview
Allele Mining: with respect to Comparative Protein Structure Modelling and Docking study
Sunil Kumar
Institute of Life Sciences
Bhubaneswar
E-mail: [email protected]
• Enormous sequence information is available in public databases as a result of sequencing of diverse crop genomes.
• It is important to use this genomic information for the identification and isolation of novel and superior alleles of agronomically important genes from crop gene pools to suitably deploy for the development of improved cultivars.
• Allele mining is a promising approach to dissect naturally occuring allelic variation at candidate genes controlling key agronomic traits which has potential applications in crop improvement programs.
• It helps in tracing the evolution of allels, identification of new haplotypes and development of allele specific markers for use in marker-assisted selection.
Allele Mining: an Introduction
Allele Mining…..cont
• Initial studies of allele mining have focused only on the identification of SNP/InDels at coding sequences or exons of the gene.
• Since these variations were expected to affect the encoded protein structure and/or function
• However, recent reports indicate that the nucleotide changes in non-coding regions (5’UTR) including promoter, introns and 3’ UTR) also have significant effect on transcript synthesis and accumulation which in turn alter the trait expression
Information Transfer pathway within the cell
……ATGCATGCATGCATGCATGC..
………CGUACGUACGUACGU…………
………CGUACGUACGUACGU…………
DECODING MECHANISM
DNA
RNA
PROTEIN Sequence
PROTEIN Structure
Biological function
Proteins
Proteins are the building blocks of life.
In a cell, 70% is water and 15%-20% are proteins.
Examples:hormones – regulate metabolismstructural – hair, wool, muscle,…antibodies – immune responseenzymes – chemical reactions
A protein is composed of a central backbone and a collection of (typically) 50-2000 amino acids
There are 20 different kinds of amino acidsName 3-letter code 1-letter codeLeucine Leu LAlanine Ala ASerine Ser SGlycine Gly GValine Val VGlutamic acid Glu EThreonine Thr T
Amino Acids
Amino Acids
Side chain
Each amino acid is identified by its side chain, which determines the properties of this amino acid.
Side Chain Properties
•Hydrophobic stays inside, while hydrophilic stay close to water
•Oppositely charged amino acids can form salt bridge.
•Polar amino acids can participate hydrogen bonding
Protein Folding
•Proteins must fold to function
•Some diseases are caused by misfoldinge.g., mad cow
disease
Three Structure Levels
Beta Sheet
Helix
Loop
Primary structure: sequence of amino acids– e.g., DRVYIHPF
Secondary structure: local folding patterns– e.g., alpha-helix,
beta-sheet, loop
Tertiary structure: complete 3D fold
Beta Sheet Examples
Anti-parallel beta sheetParallel beta sheet
Helix Examples
Domain, Fold, Motif
•A protein chain could have several domains▫A domain is a discrete portion of a protein, can
fold independently, possess its own function
•The overall shape of a domain is called a fold. There are only a few thousand possible folds.
•Sequence motif: highly conserved protein subsequence
•Structure motif: highly conserved substructure
Protein Data BankAbout 50,000 protein structures, solved using
experimental techniques ~800 are unique structural folds
Different structural folds
Same structural folds
The Problem
• Protein functions determined by 3D structures
• ~ 50,000 protein structures in PDB (Protein Data Bank)
• Experimental determination of protein structures time-consuming and expensive
• Many protein sequences available
sequence
proteinstructure
function
medicine
“Three-dimensional protein structures are important in understanding the mechanisms of human genetic diseases, predicting the effect of non-synonymous single nucleotide polymorphisms and developing new personalized medicines”
Xie and Bourne (2005) PLoS Compt.Biol. 1:e31
Why Protein 3D Structures?
3D Structures of Proteins
Better Understanding of Protein Functions
What is Homology Modeling?What is Homology Modeling?
An approach to predict a model of the three-dimensional structure of a given protein sequence (TARGET) based on an alignment to one or more known protein structures (TEMPLATES)
The homology modeling method is based on the assumption that the structure of an unknown protein is similar to known structures of reference proteins
A model is desirable when either X-ray crystallography or NMR spectroscopy can not determine the structure of a protein in time or at all.
While the 3-D structure of proteins can be determined by x-ray crystallography and NMR spectroscopy. These experimental techniques are time consuming and not possible if a sufficient quantity and quality of a proteins is not available.
The built model provides a wealth of information of how the protein functions with information at residue property level. This information can than be used for mutational studies or for drug design..
Why a Model?
Protein Structure Determination
• High-resolution structure determination▫ X-ray crystallography (~1Å)▫ Nuclear magnetic resonance (NMR) (~1-2.5Å)
• Low-resolution structure determination▫ Cryo-EM (electron-microscropy) ~10-15Å
X-ray crystallography• most accurate
• An extremely pure protein sample is needed.
• The protein sample must form crystals that are relatively large without flaws. Generally the biggest problem.
• Many proteins aren’t amenable to crystallization at all (i.e., proteins that do their work inside of a cell membrane).
• ~$100K per structure
Nuclear Magnetic Resonance
• Fairly accurate
• No need for crystals
• limited to small, soluble proteins only.
1. Identification of structures that will form the template for modelling
2. Sequence Alignment of the target with template
3. Transfer of the coordinates from the template(s) to the target of structurally conserved regions (SCR’s)
4. Modelling the missing regions
5. Refinement and validation of the model
Steps in homology modellingTarget’s sequence
Target’s structure
Template search
• Homology modeling is based on using similar structures i.e. no Similar structures = No Model
• 40% amino acid identity or higher is best; below that is not advisable but examples of success do exist
• Need sequence similarity across the whole sequence,not just in one part
Searching DatabasesQuery
Database
BLASTING…. FASTING….
Key Step:
Sequence alignment of the target with the basis structures
Good Alignment
Good Model
• Sequence alignment is a basic technique in homology modeling.
• It is used to establish a one-to-one correspondence between the amino acids of the reference protein (template) and those of the unknown protein (target) in the structurally conserved regions.
• The correspondence is the basis for transferring coordinates from the reference to the model protein
Sequence A
Sequence B
GGTGGAC
AAAGGTGAC
GGTGGAC
AAAGGTG - AC
A Sample alignment of two DNA sequences
(a) Un-gapped alignment
(b) Gapped alignment. The “I” indicates matching nucleotides
Local Alignment
Global Alignment
Sequence Alignment
Applications: Global alignment : essential for comparative
modeling.Local alignment : sufficient for functional
domains.N.B: Global alignment is computationally more time
consuming than the local alignment.
Sequence Homology Vs Sequence Similarity
Dotplot:
A
T T C
A
C
A
T A
T A C A T T A C G T A C
Sequence 1
Sequence 2
A dotplot gives an overview of all possible alignments
Dynamic Programming
• Needleman and Wunsch Algorithm
- Global Alignment -
• Smith and Waterman Algorithm
- Local Alignment -
Dynamic programming is a computational method used for
aligning two protein or nucleotide sequences. The method
compares every pair of residues/nucleotides in the two sequences
and generates an alignment.
In the alignment matches, mismatches and gaps in the two
sequences are positioned in such a way that the number of
matches between identical or similar residues is maximum
possible.
F(i, j) = F(i-1, j-1) + s(xi ,yj)
F(i, j) = max F(i, j) = F(i-1, j) - d
F(i, j) = F(i, j-1) - d
F(i-1, j-1) F(i, j-1)
F(i-1,j)F(i, j)
-d
-d
s(xi ,yj)
Steps
1. Initialization:- 1st Row and 1st Column- Filled with Multiple of Gap Penalty
2. Rest of the cells: Filled with Vmax Value
3. Generation of Optimal path: Through back tracking
4. Generation of optimal alignment: For the optimal path (No. of optimal path = No. of optimal alignment
Scoring Scheme :- Given an alignment between two sequences, we can compute its similarity by :-
1) Rewarding for a match Match => +12) Penalizing for a mismatch Mismatch => -13) Penalizing for a gap Gap or Indel => -2
Two differences:
1.
2. An alignment can now end anywhere in the matrix
Smith and Waterman(local alignment)
Example:Sequence 1 H E A G A W G H E ESequence 2 P A W H E A E
Scoring parameters:BLOSUMGap penalty: Linear gap penalty of 8
0
F(i, j) = F(i-1, j-1) + s(xi ,yj)
F(i, j) = F(i-1, j) - d
F(i, j) = F(i, j-1) - d
F(i, j) = max
Comparative Modelling Methods
Restrained based methods -MODELLER
(Sali and Blundell, 1993)
MODELLERMODELLER MODELLER is a computer program that models
three-dimensional structures of proteins and their assemblies by satisfaction of spatial restraints.
MODELLER is most frequently used for homology or comparative protein structure modeling.
The user provides an alignment of a sequence to be modeled with known related structures and MODELLER will automatically calculate a model with all non-hydrogen atoms.
A 3D model is obtained by optimization of a molecular probability density function (pdf).
Format for Modeller:INCLUDESET ATOM_FILES_DIRECTORY = './:../‘
SET PDB_EXT = '.atm‘
SET STARTING_MODEL = 1
SET ENDING_MODEL = 20
SET MD_LEVEL = 'refine1‘
SET DEVIATION = 4.0
SET KNOWNS ='1JKE‘
SET HETATM_IO = off
SET WATER_IO = off
SET ALIGNMENT_FORMAT = 'PIR‘
SET SEQUENCE = 'target1‘
SET ALNFILE = 'multiple1.ali
CALL ROUTINE = 'model'
Loop Modelling
Loop region
Calculate distances between the anchor residues.
Loop Generation Process:
1. Select a loop for each region2. Fixing of the loop
FRAGMENTDATABASE
Loop Library
• Loops extracted from PDB using high resolution (<2 Å) X-ray structures
• Typically thousands of loops in DB
• Includes loop coordinates, sequence, # residues in loop, Ca-Ca distance, preceding 2o structure and following 2o structure (or their Ca coordinates)
Structure Validation
(a)Stereochemical Quality Check
(b) Residue Environment Check
Stereochemical Quality Check
PROCHECK(Thornton and Co-workers)
Following properties are calculated and analysedin comparison with those of highly refined structures solved at varying resolutions.
Torsional angles:- (f,y) combination- c1-c2 combination- c1 torsion for those residues without c2- combined c3 and c4 angles- w angles
Covelent geometry:- main-chain bond lengths- main-chain bond angles
Profiles-3D
•Amino acid residues in proteins can be classified according to their local environments:
▫solvent accessibility ▫secondary structure ▫polarity of other protein chemical groups in
contact with them
Refining the Model
- Energy minimize N- and C-termini.- Repair spliced peptide bonds.- Minimize loop regions- Energy minimize mutated side chains in SCRs.- Minimize segments together.
Energy Minimization
• Energy minimization adjusts the structure of the molecule in order to lower the energy of the system.
• For small molecules, a global minimum energy configuration can often be found.
• for large macromolecular systems, energy minimization allows one to examine the local minimum around a particular conformation.
Modelling on the Web
• Prior to 1998 homology modelling could only be done with commercial software or command-line freeware
• The process was time-consuming and labor-intensive
• The past few years has seen an explosion in automated web-based homology modelling servers
• Now anyone can homology model!
Application of Comparative ModelingApplication of Comparative Modeling
- Comparative modeling is an efficient way to obtain useful information about the proteins of interest. For example – comparative modeling can be helpful in- Designing mutants to text hypothesis about the proteins function.- Identifying active and binding sites.- Searching for designing and improving.
- Modeling substrate specificity.- predicting antigenic epitopes.- Simulating protein – protein docking.- Confirming a remote structural relationship.
Prediction of the optimal physical configuration and energy between two molecules
The docking problem optimizes:
Binding between two molecules such that their orientation maximizes
the interaction
Evaluates the total energy of interaction such that for the best
binding configuration the binding energy is the minimum
The resultant structural changes brought about by the interaction
What is docking?
Molecular Docking
• The process of “docking” a ligand to a binding site mimics the natural course of interaction of the ligand and its receptor via a lowest energy pathway.
• Put a compound in the approximate area where binding occurs and evaluate the following:
– Do the molecules bind to each other?
– If yes, how strong is the binding?
– How does the molecule (or) the protein-ligand complex look like. (understand the intermolecular interactions)
– Quantify the extent of binding.
Few terms related to docking
• Receptor: The receiving molecule, most commonly a protein or other biopolymer.
• Ligand: The complementary partener molecule which binds to the receptor. Ligands are most often small molecules but could also be another biopolymer.
• Docking: Computational simulation of a candidate ligand binding to a receptor.
• Binding mode: The orientation of the ligand relative to the receptor as well as the conformation of the ligand and receptor when bound to each other.
• Pose: A candidate binding mode.
• Scoring: The process of evaluating a particular pose by counting the number of favorable intermolecular interactions such as hydrogen bonds and hydrophobic contacts.
• Ranking: The process of classifying which ligands are most likely to interact favorably to a particular receptor based on the predicted free-energy of binding.
Classes of Docking
• Both molecules usually considered rigid.• 6 degrees of freedom, 3 for rotation, 3 for translation• First apply only steric constraints to limit search space• Then examine energetics of possible binding confirmations
Protein-Protein docking
Protein-Ligand docking
• Flexible ligand, rigid receptor.• Search space much larger• Either reduce flexible ligand or rigid fragments to
connected by one or several hings (reduces confirmational space)
• Or search the confirmational space using the monte-carlo methods or molecular dynamics.
1. Protein-Protein Docking
1. Protein-Ligand Docking
optimized
It involves:
Finding useful ways of representing the molecules and molecular properties.
Exploration of the configuration spaces available for interaction between ligand and receptor.
Evaluate and rank configurations using a scoring system, in this case the binding energy
However, since it is difficult to evaluate the binding energy because the binding sites may not be easily accessible, the binding energy is modeled as follows:
∆G bind= ∆Gvdw + ∆Ghbond + ∆Gelect + ∆G conform+ ∆G tor + ∆G sol
Docking uses a “search and score” method
3D Structure of the Complex
Experimental Information:
The active site can be identified based on the position of the ligand in the crystal structures of the protein-ligand complexes
If Active Site is not KNOWN?????
Some Available Programs to Perform Docking
• Affinity• AutoDock• BioMedCAChe• CAChe for Medicinal
Chemists• DOCK• DockVision
• FlexX• Glide• GOLD• Hammerhead• PRO_LEADS• SLIDE• VRDD
Ligand in Active Site Region
Ligand
Active site residuesHistidine 6; Phenylalanine 5; Tyrosine 21; Aspartic acid 91; Aspartic acid 48; Tyrosine 51; Histidine 47; Glycine 29; Leucine 2; Glycine 31; Glycine 22; Alanine 18; Cysteine 28; Valine 20; Lysine 62
Examples of Docked structuresHIV protease inhibitors COX2 inhibitors
• Shape-complementarity method: find binding mode(s) without any steric clashes
• Only 6-degrees of freedom (translations and rotations)
• Move ligand to binding site and monitor the decrease in the energy
• Only non-bonded terms remain in the energy term
• try to find a good steric match between ligand and receptor
Rigid Docking
The DOCK algorithm in rigid-ligand mode
.. .
.
..
. .
N
NH
N
SO
F
.. .
N
NH
N
SO
F
.
N
NH
N
SO
F
N
NH
N
SO
F
1. Define the target binding site points.
2. Match the distances.
3. Calculate the transformation matrix for the orientation.
4. Dock the molecule.
5. Score the fit.
Flexible Docking
• Dock flexible ligands into binding pocket of rigid protein
• Binding site broken down into regions of possible interactions
binding site from X-ray
H-bondsparameterised binding site
Detailed calculations on all possibilities would be very expensive
The major challenge in structure based drug design to identify the best position and orientation of the ligand in the binding site of the target.
This is done by scoring or ranking of the various possibilities, which are based on empirical parameters, knowledge based on using rigorous calculations
Need for Scoring
Caspase Dependent Programmed Cell Death in Developing Embryos: A potential Target for Therapeutic Intervention against Pathogenic Nematodes For the first time, we developed and evaluated flow cytometry based assays
to assess several conserved features of apoptosis in developing embryos of a pathogenic filarial nematode Setaria digitata, in vitro.
We validated programmed cell death in developing embryos by using immuno-fluorescence microscopy and scoring expression profile of nematode specific proteins related to apoptosis [e.g. CED-3, CED-4 and CED-9].
Mechanistically, apoptotic death of embryonic stages was found to be a caspase dependent phenomenon mediated primarily through induction of intracellular ROS. The apoptogenicity of some pharmacological compounds viz. DEC, Chloroquine, Primaquine and Curcumin were also evaluated. Curcumin was found to be the most effective pharmacological agent followed by Primaquine while Chloroquine displayed minimal effect and DEC had no demonstrable effect.
Further, demonstration of induction of apoptosis in embryonic stages by lipid peroxidation products [molecules commonly associated with inflammatory responses in filarial disease] and demonstration of in-situ apoptosis of developing embryos in adult parasites in a natural bovine model of filariasis have offered a framework to understand anti-fecundity host immunity
operational against parasitic helminths.PLoS NTD, 2011
Induction of apoptosis in developing embryos of a pathogenic nematode
PLoS NTD, 2011
CARDDomain
α/β(P-loop) Domain
Cytochrome-c
Helical Domain
Winged helix Domain
CED- 4
J Mol Model, 2011
Binding efficiencies of carbohydrate ligands with different genotypes of cholera toxin B: molecular modeling, dynamics and docking simulation studies
Molecular interaction plots between carbohydrate ligand and genotype 1. a) Galactose b) Sialic acid c) N-acetyl galactosamine
J Mol Model, 2011
Molecular interaction plots between carbohydrate ligand and genotpye 3. a) Galactose b) Sialic acid c) N-acetyl galactosamine
J Mol Model, 2011
Molecular interaction plots between carbohydrate ligand and genotype 5. a) Galactose b) Sialic acid c) N-acetyl galactosamine
Molecular interaction plots between carbohydrate ligand and genotpye 6. a) Galactose b) Sialic acid c) N-acetyl galactosamine
• The promyelocytic leukemia zinc finger (Plzf) gene containing evolutionary conserved BTB domain plays a key role in self-renewal of mammalian spermatogonial stem cells.
• Little is known about the function of plzf in vertebrate, especially in fish species.
• Cloned plzf from the testis of Labeo rohita (rohu), a commercially important freshwater carp. Containing a conserved N-terminal BTB domain and C-terminal C2H2-zinc finger motifs.
Molecular cloning of cDNA and peptide structure prediction of Plzf expressed in the
spermatogonial cells of Labeo rohita
Marine Genomics, 2010
Molecular cloning of cDNA and peptide structure prediction of Plzf expressed in the
spermatogonial cells of Labeo rohita
Marine Genomics, 2010
•A 3D model of BTB domain of plzf protein was constructed by homology modeling approach.
Marine Genomics, 2010
•Molecular docking on this 3D structure established a homo-dimer between two BTB domains creating a charged pocket containing conserved AA residues: L33,C34, D35, and R49.
Marine Genomics, 2010
Thus, Plzf of SSC is structurally and possibly functionally conserved.
The identified plzf could be the first step towards exploring its role in rohu SSC behavior.
Thank you
• Alok Das Mohapatra, Sunil Kumar, Ashok Kumar Satapathy and Balachandran Ravindran (2011). Apoptosis in a pathogenic nematode involves mitochondrial pathway. PloS Neglected Tropical Disease (In Press).
• MHU Turabe Fazil, Sunil Kumar, Rohit Farmer, HP Pandey and DV Singh(2011). Binding efficiencies of carbohydrate ligands with different genotypes of cholera toxin B: Molecular Modeling, dynamics and Docking Simulation studies. J Mol Model, DOI 10.1007/s00894-010-0947-6 (Springer publication).
• Biswaranjan Paital, Sunil Kumar*, Rohit Farmer, Niraj Kanti Tripathy, Gagan Bihari Nityananda Chainy (2011) In silico prediction and characterization of 3D structure and binding properties of catalase from the commercially important crab, Scylla serrata. Interdiscip Sci Comput Life Sci 3: 110–120(Springer publication).*corresponding author.
• Chinmayee Mohapatra, Hirak Kumar Barman, Rudra Prasanna Panda, Sunil Kumar, Varsha Das, Ramya Mohanta, Shibani Mohapatra, Pallipuram Jayasankar (2010) Cloning of cDNA and prediction of peptide structure of plzf expressed in the spermatogonial cells of Labeo rohita, Mar. Genomics, doi: 10.1016/j.margen.2010.09.002. (Elsevier publication).
• MHU Turabe Fazil*, Sunil Kumar*, N Subbarao, H P Pandey and Durg V. Singh (2010). Homology modeling of a sensor histidine kinase from Aeromonas hydrophila. J Mol Model, 16: 1003-1009 * Equal contribution. (Springer publication).
• • Babu A Manjasetty, Sunil Kumar, Andrew P Turnbull and Niraj Kanti Tripathy (2009). Homology
Modeling and Analysis of Human Disease Proteins: Structural Investigations of Shwachman-Bodian-Diamond Syndrome (SBDS) model through Bioinformatics Approach InterJRI Science and Technology, Vol. 1, Issue 2,97-104