Chapter 3 Novel drug targets - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/31677/13/12...Chapter 3 Novel drug targets 60 serotonergic 5 – HT2c and the muscarinic M1. Unintentional

Chapter 3 Novel drug targets


53

3.1. Introduction In the past decade, complete genomes sequence of several microbes was worked out

(De Groot AS et al., 2002 and http://www.genomesonline.org, 2013). Moreover,

comparative genomics and subtractive genomics approach have been used to retrieve

valuable information for finding the treatment of various infections caused by

pathogens (Galperin MY et al., 1999). The critical genes crucial for the survival of

pathogen and absent in the host (Koonin EV et al., 1998) are also identified using the

subtractive genomics approach. The chances of cross-reactivity and side-effects (Barh

D et al., 2011) are minimized by selecting such non-homologous proteins which are

not present in humans. The genes and their products which can be used as potential

drug targets are also identified by analyzing these genes with the KEGG pathway

database (Moriya Y et al., 2007).

The search for novel drug targets relies on the genomics data. The comparative

genomics approach can be used for selecting non-homologous genes coding for

proteins, which are present in pathogens but not in the host. For identifying such

genes, BLAST against the human genome can be performed using BLASTP

programme. This eliminates the homologous genes present in the human. Thereafter,

the critical genes required for the survival of the pathogen can be identified using

DEG (Zhang R and Lin Y, 2009). Such approach will ensure that the drug target is

available only in the pathogen and not in the humans. Using such approach, novel

targets have been identified successfully for various pathogens (Amineni U et al.,

2010; Koteswara Reddy G et al., 2010; Gupta SK et al., 2010; Barh D and Kumar A,

2009).

Modern day drug discovery process is moving towards Cheminformatics approaches

which economize the drug development. This includes Combinatorial Chemistry, high

throughput Virtual screening, in silico ADMET screening, de novo and structure

based drug design. Structure based computational drug designing involves,

identification and molecular modeling of target proteins, discovery of specific

inhibitors by virtual screening or docking studies and obtaining drug-like molecule

via ADMET prediction with specific software (Bajorath J, 2012; Chen L et al., 2012;

Cheng T et al., 2012).


54

Discovering new therapeutic uses from existing molecules is a new approach to find

the new therapeutic use of approved drugs. This will be economized by using

available approved drugs for new treatment instead of discovering new drugs from

mysterious lead molecule (Dakshanamurthy S et al., 2012; Verma U et al., 2005).

As resistance towards antibiotics becomes more common, a greater need for

alternative treatments arises. However, despite a push for new antibiotic therapies

there has been a continued decline in the number of newly approved drugs (Donadio S

2010). Antibiotic resistance, therefore, poses a significant problem. Hence, it is the

need of the hour to explore the possibility of identification of novel drug targets and

designing of drugs against human pathogens. It can be possible now due to the

availability of proteomes of the pathogen. In the present study, the proteome of the

selected human pathogens were analyzed to identify potential drug targets and its

putative drug molecule. It is confirmed with biological experiments.


55

3.2. Materials and Methods

Figure 3.1: Flow chart: Identification of Essential Proteins

Essential Proteins

CD-HITS: duplicate proteins removed

Hypothetical proteins removed

Non Orthologous proteins in all

species of respective genus

removed

DEG: Essential

genes selected

Streptococcus spp. Staphylococcus

spp.

Klebsiella spp.

Shigella spp.

BLAST against Human:

orthologous


56

Figure 3.2: Flow chart: Identification of novel drug targets

Novel Drug targets

BLAST against gut

flora

KAAS server:

Metabolic pathway Analysis

Comparison with 'Anti-

targets'

Search: Drugbank, TTD,

PDTD, HIT

Common essential proteins in all 4 genus

Psortb v3.0: Cellular Localisation &

TMHMM: membrane proteins


57

3.2.1. Pathogens and identification of essential genes in reference pathogen

In the present investigation, pathogenic species of Staphylococcus, Streptococcus,

Klebsiella and Shigella were used. Each genus investigated separately.

The selected pathogens are;

Staphylococcus spp.

Staphylococcus aureus subsp. aureus MRSA252

Staphylococcus aureus subsp. aureus MSSA476

Staphylococcus aureus subsp. aureus Mu3

Staphylococcus aureus subsp. aureus Mu50

Staphylococcus aureus subsp. aureus str. Newman

Staphylococcus aureus subsp. aureus strain COL

Staphylococcus aureus subsp. aureus strain JH1

Staphylococcus aureus subsp. aureus strain JH9

Staphylococcus aureus subsp. aureus strain MW2

Staphylococcus aureus subsp. aureus strain N315

Staphylococcus aureus subsp. aureus strain NCTC 8325

Staphylococcus aureus USA300_FPR3757

Staphylococcus epidermidis strain RP62A

Staphylococcus haemolyticus JCSC1435

Staphylococcus saprophyticus subsp. saprophyticus ATCC 15305

Streptococcus spp.

Streptococcus agalactiae 2603V/R

Streptococcus agalactiae NEM316

Streptococcus pneumoniae 70585

Streptococcus pneumoniae G54

Streptococcus pneumoniae Hungary19A-6

Streptococcus pneumoniae JJA

Streptococcus pneumoniae P1031

Streptococcus pneumoniae strain TIGR4

Streptococcus pneumoniae Taiwan19F-14

Streptococcus pyogenes MGAS5005


58



Streptococcus pyogenes SSI-1

Klebsiella spp.

Klebsiella pneumoniae subsp. pneumoniae strain MGH 27262078578

Shigella spp.

Shigella boydii Sb227

Shigella dysenteriae Sd197

Shigella flexneri 2a str. 2457T

Shigella flexneri 2a str. 301

Shigella flexneri 5 str. 8401

Shigella sonnei Ss046

Staphylococcus epidermidis strain RP62A (Tax.ID: 176279), Streptococcus

pyogenes SSI-1 (Tax.ID: 193567), Klebsiella pneumoniae subsp. pneumoniae strain

MGH 27262078578 (Tax.ID: 272620) and Shigella flexneri 2a str. 2457T (Tax.ID:

198215) were used as a reference organism to each set of Staphylococcus,

Streptococcus, Klebsiella and Shigella respectively, due to its smaller proteome size.

Proteome of reference organisms were downloaded from NCBI

(www.ncbi.nlm.nih.gov) and this is subjected to BLASTP against above said

respective strains, with E-value of 10-4.

The obtained shared proteins were used for further analysis. As stated by Dutta.A, et

al (Dutta A et al., 2006), the proteins having sequence length less than 100 amino

acids (they were less likely to represent essential genes) were not eliminated because

our subjective investigation shows that many approved drugs targeting the proteins

which having less than 100 amino acids (Knox C et al., 2011).

The proteins which shared in all selected strain were analyzed using CD-HIT to

identify the paralogous or duplicate proteins (Huang Y et al., 2010). Sequence

identity cut-off was kept at 0.7 (70% identity); global sequence identity algorithm was


59

selected for alignment of the amino acids; bandwidth of 20 amino acids and default

parameters for alignment coverage were selected. These proteins were subjected to

BLAST against human (Homo sapiens Protein BLAST, 2012) with expectation value

(E-value) of 10-4 and Refseq protein database was selected. To search for the

homologous proteins between selected species of respective genus and host, BlastP

program was used. The obtained homologous protein set was eliminated. The

resultant dataset was found no homologous with human. DEG

(http://tubic.tju.edu.cn/deg/) was performed to identify the essential genes necessary

for the survival of the selected organisms. A random expectation value (E- value) was

kept as 10-4; minimum bit-score cut-off of 100; BLOSUM62 matrix and gapped

alignment mode were selected to screen out the essential proteins (Ren Zhang and

Yan Lin, 2009).

3.2.2. Identification of Novel drug targets

3.2.2.1. Common essential proteins in selected four genus

The obtained essential proteins of individual genus were subjected to BLAST-2.2.28+

(ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) which was downloaded

from NCBI to obtain common essential proteins in all selected genus

3.2.2.2. Metabolic Pathway Analysis

Metabolic pathway analysis of the essential proteins was done by KAAS server

(www.genome.jp/tools/kaas/). KAAS (KEGG Automatic Annotation Server) provides

functional annotations of genes by BLAST comparisons against the manually curated

KEGG Genes database. The result contains KO (KEGG Orthology) assignments and

automatically generated KEGG pathway (Moriya Y et al., 2007). KEGG pathway

studies were also conducted to analyze the occurrence of alternate pathways after

which the proteins were selected as potential drug targets.

3.2.2.3. Comparison with 'Anti-targets'

About seven proteins have been reported to form a set of 'anti-targets' (Recanatini M

et al., 2004), viz. the human ether-à-go-go-related gene (hERG), the pregnane X

receptor (PXR), constitutive androstane receptor (CAR), P-glycoprotein (P-gp), as

well as membrane receptors like the adrenergic α1a, the dopaminergic D2, the


60

serotonergic 5 – HT2c and the muscarinic M1. Unintentional binding of drugs to these

proteins causes adverse effects, leading to their labelling as anti-targets. The

sequences of 306 proteins in the human proteome corresponding to these anti-targets

were fetched from the NCBI sequence database. The accession numbers of these

protein sequences are provided as Supplementary data-1. The short-listed targets were

compared to these anti-targets by standard sequence analysis.

3.2.2.4. Sequence homology with proteome of oral and gut flora

Human gut and oral flora constitutes the microbes that are considered to influence the

physiology, nutrition, immunity and development of host. The sequence similarity

search was performed for the common proteins present in all four genus by BLAST

programme with E-Value 10-4 against proteome of 93 gut and oral floras

(Supplementary data-2).

3.2.2.5. Search against available drug targets

Screening of the potential drug targets was carried out by similarity search using

protein sequence of all the potential targets against the DrugBank (Knox C et al.,

2011), TTD (Chen X et al., 2002), PDTD (Gao Z et al., 2008) and HIT ( Ye H et al.,

2011), to reach the novel drug targets.

3.2.2.6. Subcellular localization prediction

Using computational methods the sub cellular localization of the protein by psortb

v3.0 (Yu NY et al., 2010) and outer membrane proteins by TMHMM (Krogh A et al.,

2001) were predicted to identify the surface membrane proteins which could be used

as probable vaccine candidates. Psortb generates prediction results for four major

localizations for Gram-positive bacteria (Cytoplasmic, Cytoplasmic Membrane, Cell

wall and Extracellular) and five major localizations for Gram-negative bacteria

(cytoplasmic, inner membrane, periplasmic, outer membrane and extracellular);

TMHMM (TransMembrane prediction using Hidden Markov Models) is a program

for predicting transmembrane helices based on a hidden Markov model, and it reads a

FASTA formatted protein sequence and predicts locations of transmembrane,

intracellular and extracellular regions.


61

3.2.3. Three dimensional structure modeling and validation

The three dimensional structure of four target proteins: DNA polymerase III subunit

beta (DPO3B) [EC:2.7.7.7] (GI: 28894914 & Accession: NP_801264.1), UDP-N-

acetylmuramoylalanine-D-glutamate ligase (murD) [EC:6.3.2.9] (GI: 28895598 &

Accession: NP_801948.1), 3-phosphoshikimate 1-carboxyvinyltransferase (aroA)

[EC:2.5.1.19] (GI: 28895745 & Accession: NP_802095.1) and large subunit

ribosomal protein L6 (RP-L6) (GI: 28894969 & Accession: NP_801319.1) was

modeled by SWISS-MODEL (http://swissmodel.expasy.org) is an integrated Web-

based modeling expert system. For a given target protein, a library of experimental

protein structures is searched to identify suitable templates. On the basis of a sequence

alignment between the target protein and the template structure, a three-dimensional

model for the target protein is generated. Homology modeling is currently the most

accurate computational method to generate reliable structural models and is routinely

used in many biological applications (Bordoli L et al., 2009).

Small subunit ribosomal protein S17 (RP-S17) was modelled by ab initio protein

modeling tool I-TASSER (http://zhanglab.ccmb.med.umich.edu/I-TASSER/) as no

template is available in PDB. The iterative threading assembly refinement (I-

TASSER) server is an integrated platform for automated protein structure and

function prediction based on the sequence-to-structure-to-function paradigm. Starting

from an amino acid sequence, I-TASSER first generates three-dimensional (3D)

atomic models from multiple threading alignments and iterative structural assembly

simulations. The function of the protein is then inferred by structurally matching the

3D models with other known proteins. The output from a typical server run contains

full-length secondary and tertiary structure predictions, and functional annotations on

ligand-binding sites, enzyme commission numbers and Gene ontology terms. An

estimate of accuracy of the predictions is provided based on the confidence score of

the modeling (Roy A et al., 2010)

The stereo-chemical quality of the models was verified with the program

PROCHECK. It assess the stereochemical quality of a given protein structure. The

aim of PROCHECK is to assess how normal, or conversely how unusual, the

geometry of the residues in a given protein structure is, as compared with


62

stereochemical parameters derived from well-refined, high-resolution structures

(Laskowski RA et al., 1993).

The energy was calculated at atomic level using ANOLEA server. The atomic

empirical mean force potential (ANOLEA) is used to assess packing quality of the

models. The program performs energy calculations on a protein chain, evaluating the

"Non- Local Environment" (NLE) of each heavy atom in the molecule. The y-axis of

the plot represents the energy for each amino acid of the protein chain. Negative

energy values (in green) represent favorable energy environment whereas positive

values (in red) unfavorable energy environment for a given amino acid (Melo F and

Feytmans E, 1998)

3.2.4. Active Site Identification

Active site determination was done for the modeled protein to further work on its

docking studies. Active site determination gives us an idea to make a grid before

docking. This was achieved by the online meta server MetaPocket 2.0

(http://projects.biotec.tu-dresden.de/metapocket/index.php) was used to predict ligand

binding site (Zengming Zhang et al., 2011). It designed to identify ligand binding sites

on protein surface. metaPocket is a consensus method, in which the predicted binding

sites from eight methods: LIGSITE, PASS, Q-SiteFinder, SURFNET, Fpocket,

GHECOM, ConCavity and POCASA are combined together to improve the

prediction success rate.

3.2.5. Ligand library construction

A targeted ligand library of total 261,055 molecules was constructed through retrieval

of 19127 natural products and antibacterial molecules from PubChem and PubChem

Bioassay (http://pubchem.ncbi.nlm.nih.gov/), the 162 herbal compounds from DR.

DUKE’S PHYTOCHEMICAL library (www.ars-grin.gov/duke), 31,897 molecules

were retrieved from Analyticon Discovery Database (http://ac-

discovery.emolecules.com/), and 209,869 molecules were retrieved from the Zinc

Natural product database (http://zinc.docking.org/browse/catalogs/natural-products)

and ChemSpider database (http://www.chemspider.com/).


63

3.2.6. Molecular Docking

Molecular docking studies were performed using Maestro version 9.0 (Maestro, 2009)

and High Throughput Virtual Screening Glide. This was done in order to screen the

potential inhibitors from the ligand library. All ligands were docked flexibly to their

respective targets. To prepare the system for docking, the proteins were then prepared

for subsequent grid generation and docking using the Protein Preparation Wizard tool

supplied with Glide. Using this tool, all hydrogen atoms were added and the entire

protein was minimized. Next, a grid was prepared for docking into their respective

targets using the Receptor Grid Generation tool in Glide. The molecules obtained

from HTVS Glide were given as input for LigPrep- application with the OPLS_2005

force field. Next, a grid was prepared for re-docking and docking was performed

using Glide XP mode.

Further, verification of docking studies was carried out using molegro virtual docker

(MVD). The Molegro Virtual Docker has been shown to yield higher docking

accuracy than other state-of-the-art docking products (MVD: 87%, Glide: 82%,

Surflex: 75%, FlexX: 58%). It has two docking search algorithms; MolDock

Optimizer and MolDock SE (Simplex Evolution). MolDock Optimizer is the default

search algorithm in MVD. In order to dock the receptor and ligand the receptor was

prepared from the prepare molecule option provided. Grid searching was done by

generating cavities by using detect cavity option. And finally the ligands were

provided in sdf file format for docking using docking wizard. During docking, the

following parameters were fixed: number of runs 10, population size 50, crossover

rate 0.9, scaling factor 0.5, maximum iteration 2,000 and grid resolution 0.30

(Thomsen R and Christensen MH, 2006).

3.2.7. Drug likeliness and toxicity analysis

Qikprop application was used to find the ADME property (QikProp, 2009). Thirty one

qikprop parameters were considered for each molecule. QikProp efficiently evaluates

pharmaceutically relevant properties for over half a million compounds per hour,

making it an indispensable lead generation and lead optimization tool. Toxicity was

analyzed for genotoxicity, rat model, skin sensitization, skin irritations, eye irritations,

rat dosage tolerance etc. by ToxPredict (http://apps.ideaconsult.net:8080/ToxPredict).

It evaluates compounds’ performance in experimental assays and animal models.


64

Compute and validate assessments of the toxic and environmental effects of

chemicals solely from their molecular structure. ToxPredict is a web-based interface

for predicting toxicity of individual chemicals. Users can either search for a

compound in the OpenTox prototype database, which currently includes quality

labeled data for 163,122 chemicals grouped in 2409 datasets, or upload their own

chemical structure in the SDF format. It runs the selected calculations automatically

using a collection of distributed computational services (Hardy B et al., 2010)

3.2.8. Visualization of results

The software Pymol, Molegro Virtual Docker (MVD) and LigPlot+ were used to

visualize the docked result. PyMOL is a powerful and comprehensive molecular

visualization product for rendering and animating 3D molecular structures

(http://www.pymol.org/pymol). Molegro Virtual Docker is an integrated platform for

predicting protein - ligand interactions. Molegro Virtual Docker handles all aspects of

the docking process from preparation of the molecules to determination of the

potential binding sites of the target protein, and prediction of the binding modes of the

ligands (Thomsen R and Christensen MH, 2006). A schematic 2-D representation of

protein-ligand complexes was generated by LigPlot+ (http://www.ebi.ac.uk/thornton-

srv/software/LigPlus/). LigPlot+ is a graphical system for automatically generating

multiple 2D diagrams of ligand-protein interactions from 3D coordinates. The

diagrams portray the hydrogen-bond interaction patterns and hydrophobic contacts

between the ligand(s) and the main-chain or side-chain elements of the protein. The

system is able to plot, in the same orientation, related sets of ligand-protein

interactions. This facilitates popular research tasks, such as analyzing a series of small

molecules binding to the same protein target, a single ligand binding to homologous

proteins, or the completely general case where both protein and ligand change.

(Laskowski R A et al., 2011)


65

3.3. Results and Discussion Infectious diseases are identified as the second leading cause (WHO, 2001) for death

world-wide. In spite of having an increasing demand for new antimicrobial drugs, the

new drugs identified are less due to many reasons like huge investment, less market

and competition with newly developed agents (Spellberg B et al., 2004). Many new

algorithms, tools and databases have been developed as a result of the advancement in

Bioinformatics which has facilitated the automation of microbial genome sequencing,

comparison of genomes, identification of gene product function, and simplified the

process of development of antimicrobial agents, vaccines, and rational drug design

(Bansal AK, 2005). In silico subtractive genomics approach is a powerful approach to

identify the specific genes which are present in the pathogen but absent in the host.

Thus helps in the identification in novel genus specific genes which can be used as

drug targets. In silico drug target identification mainly relies on the principle “a good

drug target is a gene essential for bacterial survival yet cannot be found in host”

(Gupta SK et al., 2010).

3.3.1. Identification of essential genes

In the current study, non-human homolog essential genes of the genus

Staphylococcus, Streptococcus, Klebsiella and Shigella as well as their protein

products have been identified using subtractive genomic approach, which are likely to

lead to the development of drugs that strongly bind with the pathogen.

3.3.1.1. Identification of essential genes: Staphylococcus spp.

The Staphylococcus epidermidis strain RP62A (Tax.ID: 176279) consists of 2525

reference proteins which were downloaded from NCBI (www.ncbi.nlm.nih.gov) and

they are subjected to BLASTP against selected strains of Staphylococcus species, with

E-value of 10-4. From the obtained 1704 proteins which shared in all selected strains,

the 984 hypothetical proteins were eliminated as hypothetical protein is a protein

whose existence has been predicted, but for which there is no experimental evidence

that it is expressed in vivo. The screened 720 proteins were analyzed using CD-HIT to

identify duplicate proteins, which were identified using 70% identity as threshold with

the CD-HIT tool. Out of 720 proteins, 61 duplicate proteins were found. Remaining

659 proteins were analyzed with the help of BLAST against human using BLASTP.

http://www.ncbi.nlm.nih.gov/


66

This revealed 240 proteins with no significant similarity with human proteins. By

using the Database of essential Genes (DEG), 181 essential proteins were identified.

Figure 3.3: Summary of essential gene identification: Staphylococcus spp.

3.3.1.2. Identification of essential genes: Streptococcus spp.

The Streptococcus pyogenes SSI-1 (Tax.ID: 193567) consists of 1859 reference

proteins which were downloaded from NCBI (www.ncbi.nlm.nih.gov) and they are

subjected to BLASTP against selected strains of Streptococcus species, with E-value

of 10-4. From the obtained 1050 proteins which shared in all selected strains, the 223

hypothetical proteins were eliminated as hypothetical protein is a protein whose

existence has been predicted, but for which there is no experimental evidence that it is

expressed in vivo. The screened 813 proteins were analyzed using CD-HIT to identify

duplicate proteins, which were identified using 70% identity as threshold with the

CD-HIT tool. Out of 813 proteins, 11 duplicate proteins were found. Remaining 802

proteins were analyzed with the help of BLAST against human using BLASTP. This

0

500

1000

1500

2000

2500

3000 2525

1704

720 659

240 181



67

revealed 406 proteins with no significant similarity with human proteins. By using the

Database of essential Genes (DEG), 283 essential proteins were identified.

Figure 3.4: Summary of essential gene identification: Streptococcus spp.

3.3.1.3. Identification of essential genes: Klebsiella spp.

The Klebsiella pneumoniae subsp. pneumoniae strain MGH 78578 (Tax.ID: 272620)

consists of 5185 reference proteins which were downloaded from NCBI

(www.ncbi.nlm.nih.gov) and they are not subjected to BLASTP against any other

strains of Klebsiella species, as in this genus only one organism is considered for the

investigation. The 402 hypothetical proteins were eliminated as hypothetical protein is

a protein whose existence has been predicted, but for which there is no experimental

evidence that it is expressed in vivo. The screened 4783 proteins were analyzed using

CD-HIT to identify duplicate proteins, which were identified using 70% identity as

threshold with the CD-HIT tool. Out of 4783 proteins, 321 duplicate proteins were

found. Remaining 4462 proteins were analyzed with the help of BLAST against

human using BLASTP. This revealed 2321 proteins with no significant similarity with

0200400600800

100012001400160018002000

1859

1050 813 802

406 283



68

human proteins. By using the Database of essential Genes (DEG), 453 essential

proteins were identified.

Figure 3.5: Summary of essential gene identification: Klebsiella spp.

3.3.1.4. Identification of essential genes: Shigella spp.

The Shigella flexneri 2a str. 2457T (Tax.ID: 198215) consists of 4060 reference

proteins which were downloaded from NCBI (www.ncbi.nlm.nih.gov) and they are

subjected to BLASTP against selected strains of Streptococcus species, with E-value

of 10-4. From the obtained 2451 proteins which shared in all selected strains, the 624

hypothetical proteins were eliminated as hypothetical protein is a protein whose

existence has been predicted, but for which there is no experimental evidence that it is

expressed in vivo. The screened 1827 proteins were analyzed using CD-HIT to

identify duplicate proteins, which were identified using 70% identity as threshold with

the CD-HIT tool. Out of 1827 proteins, 341 duplicate proteins were found. Remaining

1486 proteins were analyzed with the help of BLAST against human using BLASTP.

0

1000

2000

3000

4000

5000

6000 5185 5185 4783

4462

2321

453


69

This revealed 972 proteins with no significant similarity with human proteins. By

using the Database of essential Genes (DEG), 465 essential proteins were identified.

Figure 3.6: Summary of essential gene identification: Shigella spp.

3.3.2. Identification of Novel drug targets

3.3.2.1. Common essential proteins in all four genus

Among selected four genus, 30 essential proteins were found as common. This was

achieved by BLAST-2.2.28+ (Camacho C, 2009).

3.3.2.2. Metabolic Pathway analysis

The obtained 30 essential proteins were analyzed using KAAS server. The

involvements of drug targets in metabolic pathways were analyzed. Comparative

analysis of the metabolic pathways of the host and pathogen was performed to trace

out drug targets involved in pathogen specific metabolic pathways. Detailed pathway

analysis revealed that all 30 proteins were such that after targeting them the organism

0500

10001500200025003000350040004500

4060

2451

1827 1486

972 465


70

will not survive. In other words, these proteins can act as a drug target. Hence these

30 proteins can be consider as very crucial for the survival of the organism.

The identified 30 proteins involved in the 4 pathways / biological process which are

unique to pathogens and 14 pathways / biological process which are communal in

both host and pathogen. All 18 pathways / biological process were classified into 8

classes: Amino Acid Metabolism (Phenylalanine, tyrosine and tryptophan

biosynthesis, Cysteine and methionine metabolism), Carbohydrate Metabolism

(Amino sugar and nucleotide sugar metabolism), Glycan Biosynthesis and

Metabolism (Peptidoglycan biosynthesis), Lipid Metabolism (Fatty acid biosynthesis

& Glycerophospholipid metabolism), Metabolism of Other Amino Acids ( D-

Glutamine and D-glutamate metabolism), Nucleotide Metabolism (Purine &

Pyrimidine metabolism), Genetic Information Processing (Translation, Folding,

Sorting and Degradation, and Replication and Repair) and Environmental Information

Processing (Membrane Transport and Signal Transduction) (Table 3.1). Figure 3.7

enlighten the percentage distribution of novel drug targets involved in different

metabolic pathways/biological process


71

Table 3.1: Essential proteins involved in different metabolic pathways and other

cellular activities

S.No KEGG Orthology number (ko), Gene and Protein Name

1. Amino Acid Metabolism (Phenylalanine, tyrosine and tryptophan biosynthesis,

Cysteine and methionine metabolism and Lysine biosynthesis)

1. ko:K00014 aroE; shikimate dehydrogenase (EC:1.1.1.25)

2. ko:K00800 aroA; 3-phosphoshikimate 1-carboxyvinyltransferase (EC:2.5.1.19)

3. ko:K01243 mtnN; S-adenosylhomocysteine/5'-methylthioadenosine

nucleosidase

(EC:3.2.2.9)

2. Carbohydrate Metabolism (Amino sugar and nucleotide sugar metabolism)

1 ko:K00790 murA; UDP-N-acetylglucosamine 1-carboxyvinyltransferase

(EC:2.5.1.7)

3. Glycan Biosynthesis and Metabolism (Peptidoglycan biosynthesis)

1 ko:K00790 murA; UDP-N-acetylglucosamine 1-carboxyvinyltransferase

(EC:2.5.1.7)

2 ko:K01924 murC; UDP-N-acetylmuramate--alanine ligase (EC:6.3.2.8)

3 ko:K01925 murD; UDP-N-acetylmuramoylalanine--D-glutamate ligase

(EC:6.3.2.9)

4 ko:K01929 murF; UDP-N-acetylmuramoylalanyl-D-glutamyl-2,6-

diaminopimelate—

D-alanyl-D-alanine ligase (EC: 6.3.2.10)

4. Lipid Metabolism (Fatty acid biosynthesis and Glycerophospholipid metabolism)

1 ko:K00648 fabH; 3-oxoacyl-(acyl-carrier-protein) synthase III (EC:2.3.1.180)

2 ko:K06131 cls; cardiolipin synthase (EC:2.7.8.-)

5. Metabolism of Other Amino Acids (D-Glutamine and D-glutamate metabolism)

1 ko:K01776 E5.1.1.3; glutamate racemase (EC:5.1.1.3)


72

2 ko:K01924 murC; UDP-N-acetylmuramate--alanine ligase (EC:6.3.2.8)

3 ko:K01925 murD; UDP-N-acetylmuramoylalanine--D-glutamate ligase

(EC:6.3.2.9)

6. Nucleotide Metabolism (Purine and Pyrimidine metabolism)

1 ko:K01589 purK; 5-(carboxyamino)imidazole ribonucleotide synthase

(EC:6.3.4.18)

2 ko:K01591 pyrF; orotidine-5'-phosphate decarboxylase (EC:4.1.1.23)

3 ko:K02337 DPO3A1; DNA polymerase III subunit alpha (EC:2.7.7.7)

4 ko:K02338 DPO3B; DNA polymerase III subunit beta (EC:2.7.7.7)

5 ko:K03763 DPO3A2; DNA polymerase III subunit alpha, Gram-positive type

(EC:2.7.7.7)

7. Genetic Information Processing (Translation, Folding, Sorting and

Degradation, and Replication and Repair)

1. ko:K02314 dnaB; replicative DNA helicase (EC:3.6.4.12)

2. ko:K02316 dnaG; DNA primase (EC:2.7.7.-)

3. ko:K02337 DPO3A1; DNA polymerase III subunit alpha (EC:2.7.7.7)

4. ko:K02338 DPO3B; DNA polymerase III subunit beta (EC:2.7.7.7)

5. ko:K02933 RP-L6; large subunit ribosomal protein L6

6. ko:K02961 RP-S17; small subunit ribosomal protein S17



9. ko:K03070 secA; preprotein translocase subunit SecA

10. ko:K03076 secY; preprotein translocase subunit SecY

11. ko:K03470 rnhB; ribonuclease HII (EC:3.1.26.4)

12. ko:K03629 recF; DNA replication and repair protein RecF

13. ko:K03657 uvrD; DNA helicase II / ATP-dependent DNA helicase PcrA

(EC:3.6.4.12)

14. ko:K03763 DPO3A2; DNA polymerase III subunit alpha, Gram-positive type

(EC:2.7.7.7)

15. ko:K04066 priA; primosomal protein N' (replication factor Y) (superfamily II

helicase) (EC:3.6.4.-)


73

8. Environmental Information Processing (Membrane Transport and Signal

Transduction)

1. ko:K02313 dnaA; chromosomal replication initiator protein

2. ko:K03070 secA; preprotein translocase subunit SecA

3. ko:K03076 secY; preprotein translocase subunit SecY

4. ko:K07652 vicK; two-component system, OmpR family, sensor histidine

kinase

VicK (EC:2.7.13.3)

5. ko:K09815 znuA; zinc transport system substrate-binding protein

8% 3%

11%

5%

8%

13%

39%

13%

1 2 3 4 5 6 7 8

1. Amino Acid Metabolism (8%) 2. Carbohydrate Metabolism (3%) 3. Glycan Biosynthesis and Metabolism (11%) 4. Lipid Metabolism (5%)

5. Metabolism of Other Amino Acids (8%) 6. Nucleotide Metabolism (13%) 7. Genetic Information Processing (39%) 8. Environmental Information Processing (13%)

Figure 3.7: Percentage distribution of essential proteins involved in different metabolic pathways / Biological process


74

Essential proteins in pathogens’ unique pathways

Current study shows that 9 proteins are uniquely involved in the pathogen specific 4

pathways: Peptidoglycan biosynthesis, Bacterial secretion system, ABC transporters

and Two-component system.

Among the cytoplasmic steps involved in the biosynthesis of peptidoglycan (KEGG

Pathway: map00550), the 4 enzymes uniquely present in this pathways are murA;

UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC:2.5.1.7), murC; UDP-N-

acetylmuramate--alanine ligase (EC: 6.3.2.8), murD; UDP-N-acetylmuramoylalanine-

-D-glutamate ligase (EC: 6.3.2.9) and murF; UDP-N-acetylmuramoylalanyl-D-

glutamyl-2,6-diaminopimelate--D-alanyl-D-alanine ligase (EC:6.3.2.10). The

peptidoglycan is a macromolecule made of long aminosugar strands cross-linked by

short peptides. It forms the cell wall in bacteria surrounding the cytoplasmic

membrane. The glycan strands typically comprise repeating N-acetylglucosamine

(GlcNAc) and N-acetylmuramic acid (MurNAc) disaccharides. Each MurNAc is

linked to a peptide of three to five amino acid residues. Disaccharide subunits are first

assembled on the cytoplasmic side of the bacterial membrane on a polyisoprenoid

anchor (lipid I and II). Polymerization of disaccharide subunits by transglycosylases

and cross-linking of glycan strands by transpeptidases occur on the other side of the

membrane. The enzymes involved in peptidoglycan biosynthesis are among the best

known targets in the search for new antibiotics (Barreteau H et al., 2008; Bouhss A et

al., 2008).

Two proteins, preprotein translocase subunit SecA and subunit SecY which is the

parts of Bacterial secretion system (Pathway: map03070), were identified as essential

proteins. In Gram-positive bacteria, secreted proteins are commonly translocated

across the single membrane by the Sec pathway or the two-arginine (Tat) pathway

(Driessen AJ and Nouwen N, 2008; Nakatogawa H et al., 2004).

One transport protein - zinc transport system substrate-binding proteins (znuA) -

present in ABC transporters (KEGG Pathway: map02010) is an essential protein. The

ATP-binding cassette (ABC) transporters form one of the largest known protein


75

families, and are widespread in bacteria, archaea, and eukaryotes. Though this

pathway available in eukaryotes also, in this study we consider as pathogen specific

pathway, since in a typical eukaryotic ABC transporter, the membrane spanning

protein and the ATP-binding protein are fused, forming a multi-domain protein with

the membrane-spanning domain (MSD) and the nucleotide-binding domain (NBD).

ABC transporters couple ATP hydrolysis to active transport of a wide variety of

substrates such as ions, sugars, lipids, sterols, peptides, proteins, and drugs. The

structure of a prokaryotic ABC transporter usually consists of three components;

typically two integral membrane proteins each having six transmembrane segments,

two peripheral proteins that bind and hydrolyze ATP, and a periplasmic (or

lipoprotein) substrate-binding protein (Tomii K and Kanehisa M, 1998).

Total two essential proteins: dnaA; chromosomal replication initiator protein and

vicK; two-component system, OmpR family, sensor histidine kinase VicK (EC:

2.7.13.3) are present in two-component system (KEEG Pathway: map02020). Two-

component signal transduction systems enable bacteria to sense, respond, and adapt to

changes in their environment or in their intracellular state. Each two-component

system consists of a sensor protein-histidine kinase (HK) and a response regulator

(RR). It often enables cells to sense and respond to stimuli by inducing changes in

transcription (Gotoh Y et al., 2010).

Essential proteins in host-pathogen common pathways

Current study shows that total 26 proteins are involved in 14 metabolic pathways,

which are common in host and pathogen. These pathways are Phenylalanine, tyrosine

and tryptophan biosynthesis (KEGG Pathway: map00400), Cysteine and methionine

metabolism (KEGG Pathway: map00270), Amino sugar and nucleotide sugar

metabolism (KEGG Pathway: map00520), Fatty acid biosynthesis (KEGG Pathway:

map00061), Glycerophospholipid metabolism (KEGG Pathway: map00564), D-

Glutamine and D-glutamate metabolism (KEGG Pathway: map00471), Pyrimidine

metabolism (KEGG Pathway: map00240), Purine metabolism (KEGG Pathway:

map00230), Ribosome (KEGG Pathway: map03010), Protein export (KEGG

Pathway: map03060), DNA replication (KEGG Pathway: map03030), Mismatch


76

repair (KEGG Pathway: map03430), Homologous recombination (KEGG Pathway:

map03440) and Nucleotide excision repair (KEGG Pathway: map03420) (Table 3.1)

Out of these 26 proteins 5 proteins are involved in pathogens’ unique pathways also.

3.3.2.3. Comparison with Anti-targets

An ideal target should not only have specific recognition to the drug directed against

it, but should also be sufficiently different from the host proteins, which have been

termed as anti-targets. Considering this aspect early in the drug discovery pipeline

may prove to be very useful in minimising the risk of failure of the drug candidates in

the later stages of drug discovery. Anti-targets include proteins such as the

transporters and pumps, which modify the bio-availability of a drug by their efflux

action, or those proteins that trigger hazardous side effects, such as the hERG protein,

which when blocked causes the 'sudden death syndrome' (Recanatini M et al., 2004).

This list is by no means complete, but has been included here, more from a conceptual

perspective, to highlight the need for screening against anti-targets. Sequence

comparisons against 306 sequences belonging to the eight categories of anti-targets

carried out revealed that sequence homologues at a similarity of 30% for over 30% of

the query length were observed for none of the targets from the screened 30 proteins.

Such a loose similarity measure is used, since it is desired to rule out even a remote

similarity with any anti-target. Moreover, close homologues have already been

eliminated by sequence analysis earlier by BLAST against human using BLASTP.

3.3.2.4. Similarity to Gut and Oral Flora Proteins

The targets from the metabolic pathway analysis were further compared to the protein

sequences of hundreds of organisms that inhabit the gut of a healthy human. This was

carried out to prune the list of identified drug targets, so that the drugs administered

do not bind unintentionally to the proteins of the gut flora. Unintentional inhibition of

gut flora proteins is known to lead to adverse effects and can promote pathogenic

colonisation of the gut (Levy J, 2000). Drug interactions with gut flora are also

believed to be the cause of idiosyncratic drug toxicity and reduced bio-availability of

the drug (Nicholson JK and Wilson ID, 2003; Nicholson JK et al., 2005). Similarity

of the identified targets to such proteins therefore affects their suitability. The

sequence analysis carried out here indicates that 20 proteins from the screened


77

proteins obtained from previous step had close homologues in the gut flora and were

hence removed from the list of most viable targets.

3.3.2.5. Search against available drug targets

Screening of Drug targets was carried out using DrugBank, TTD, PDTD and HIT for

the 8 proteins identified from the previous analysis. Out of 10 proteins, none of the

proteins were act as a drug targets for approved drugs.

3.3.2.6. Structural Assessment of Targetability

Similarity between proteins is better captured through structural comparisons, where

structural data for both proteins are available. In fact, what ultimately matters in

determining the pharmacological profiles of drug molecules is the recognition of the

drug molecules by various protein molecules at their binding sites. It is therefore

important to compare binding sites in the various protein molecules in both the

pathogen and the host. At this step, we want to critically weed out targets that share

very high similarity with binding sites from the human 'pocketome', since targeting

these may lead to adverse drug reactions, due to inadvertent binding with human

proteins.

This type of analysis would become more meaningful if carried out at the proteome-

scale. Advances in crystallography and various structural genomics projects (Edwards

A et al., 2005; Gileadi O et al., 2007) have led to the determination of 3D structure of

proteins from selected microbes and human. In the absence of experimentally

determined structures, high-confidence homology models were obtained from the

ModBase database. The availability of such a large number of protein structures

makes it feasible to carry out a proteome-scale structural assessment of targetability.

Out of selected ten proteins 5 proteins 3D structure directly retrieved from PDB and

other 5 were modeled.

The top 10 binding sites for each protein, identified using Pocket Depth was

compared with the top three binding pockets from LigsiteCSC. LigsiteCSC considers

amino acid conservation at the putative sites, in the family of proteins. This

automatically leads to identifying residues and hence the sites that are likely to be


78

functionally important. Finding a consensus among top predictions between the two

methods increases the confidence in site prediction significantly.

A consensus between Pocket Depth and LigsiteCSC was obtained so as to identify the

most probable pockets that also contained conserved amino acid residues at the

binding sites. An all-versus-all comparison of the pocket of selected proteins and

human proteins was performed, using Pocket Match.

A Pocket Match score of 0.8 or more indicates high similarity between two binding

pockets. This threshold was used as a filter to eliminate all those proteins in selected

organism whose pockets closely matched with any pocket of any protein in the human

proteome. Of the 10 proteins, none had closely matching pockets in the human

proteomes and were therefore none of the proteins were eliminated from the pipeline.

It is possible that these proteins contain some pockets that are sufficiently different

from pockets of human proteins. The resulting proteins form this step, of targets that

can be further explored for drug discovery.

Figure 3.8: Summary of Novel drug target identification: All four genus

0

5

10

15

20

25

30

30 30 30

10 10 10 10


79

Table 3.2: List of Novel drug targets: All four genus

S.No

Protein Name NCBI GI & Ac Protein Localization

Involved Metabolism

1 ko:K02338 DPO3B; DNA polymerase III subunit beta (EC:2.7.7.7)

gi|28894914|ref|NP_801264.1|

Cytoplasmic Nucleotide Metabolism & Replication and Repair

2 ko:K01925 murD; UDP-N-acetylmuramoylalanine--D-glutamate ligase (EC:6.3.2.9)

gi|28895598|ref|NP_801948.1|

Cytoplasmic Metabolism of Other Amino Acids Glycan Biosynthesis and Metabolism (D-Glutamine and D-glutamate metabolism and Peptidoglycan biosynthesis)

3 ko:K00800 aroA; 3-phosphoshikimate 1-carboxyvinyltransferase [EC:2.5.1.19]

gi|28895745|ref|NP_802095.1|

Cytoplasmic Phenylalanine, tyrosine and tryptophan biosynthesis

4 ko:K02933 RP-L6; large subunit ribosomal protein L6

gi|28894969|ref|NP_801319.1|

Cytoplasmic Translation (Ribosome)

5 ko:K02961 RP-S17; small subunit ribosomal protein S17

gi|28894963|ref|NP_801313.1|

Cytoplasmic Translation (Ribosome)

6 ko:K07652 vicK; two-component system, OmpR family, sensor histidine kinase VicK (EC:2.7.13.3)

gi|28896392|ref|NP_802742.1|

Cytoplasmic Membrane

Signal Transduction (Two-component system)

7 ko:K03470 rnhB; ribonuclease HII [EC:3.1.26.4]

gi|28895931|ref|NP_802281.1|

Cytoplasmic Replication and Repair (DNA replication)

8 ko:K03470 rnhB; ribonuclease HII (EC:3.1.26.4)

gi|28895931|ref|NP_802281.1|

Cytoplasmic Replication and Repair (DNA replication)

9 ko:K01776 E5.1.1.3; glutamate racemase (EC:5.1.1.3)

gi|28896509|ref|NP_802859.1|

Cytoplasmic Metabolism of Other Amino Acids (D-Glutamine and D-glutamate metabolism)

10 ko:K01929 murF; UDP-N-acetylmuramoylalanyl-D-glutamyl-2,6-diaminopimelate--D-alanyl-D-alanine ligase (EC:6.3.2.10)

gi|28895693|ref|NP_802043.1|

Cytoplasmic Amino Acid ( Lysine) Metabolism Glycan Biosynthesis and Metabolism (Peptidoglycan biosynthesis)


80

3.3.2.7. Subcellular localization prediction

Computational prediction of bacterial protein subcellular localization (SCL) provides

a quick and inexpensive means for gaining insight into protein function, verifying

experimental results, annotating newly sequenced bacterial genomes, and detecting

potential cell surface/secreted drug targets (Gardy JL and Brinkman FS, 2006). The

protein localization study revealed that among ten predicted novel drug targets, nine

proteins were present in cytoplasm and one was in cytoplasmic membrane. Most of

the available drug targets are present in these two cellular compartments (Bakheet TM

and Doig AJ, 2010).

Reverse vaccinology is an emerging vaccine development approach that starts with

the prediction of vaccine targets by Bioinformatics analysis of microbial genome

sequences (Delany I et al., 2013). Subcellular location is considered as one main

criterion for vaccine target prediction. However, more criteria have been added. For

example, since it was found that outer membrane proteins containing more than one

transmembrane helix were, in general, difficult to clone and purify (Pizza M et al.,

2000), the number of transmembrane domains for a vaccine target is often considered

in Bioinformatics filtering. So, in this study only one outer membrane proteins which

is having one or less than one transmembrane helix was identified (Table 3.2): vicK;

two-component system, OmpR family, sensor histidine kinase VicK (EC:2.7.13.3).

This protein could be cloned and over expressed to study the possibilities of the

vaccine candidates.

3.3.3. Three dimensional structure Prediction and Analysis

As a case study, randomly five proteins were considered for further studies. The three

dimensional structure of four target proteins are modelled by homology modelling

(SWISS-MODEL first approach mode): DNA polymerase III subunit beta (DPO3B)

(EC: 2.7.7.7), UDP-N-acetylmuramoylalanine-D-glutamate ligase (murD) (EC

6.3.2.9), 3-phosphoshikimate 1-carboxyvinyltransferase (aroA) (EC 2.5.1.19) and

large subunit ribosomal protein L6 (RP-L6). Small subunit ribosomal protein S17 was

predicted by ab initio (T-TASSER) method due to unavailability of template.

The 3D structure of the DNA polymerase III subunit beta (GI: 28894914 &

Accession: NP_801264.1) was modeled based on the template DNA polymerase III


81

beta subunit of Streptococcus

pyogenes (PDB ID: 2avt, Chain:

A), which have the sequence

identity 97.08% and E-value 0.00e-

1. The predicted structure is shown

in figure 3.9. DNA polymerase III

subunit beta involved in purine

metabolism, pyrimidine

metabolism, DNA replication,

mismatch repair and homologous

recombination. A complex network

of interacting proteins and enzymes

is required for DNA replication.

Generally, DNA replication

follows a multistep enzymatic

pathway. At the DNA replication

fork, a DNA helicase (DnaB)

precedes the DNA synthetic

machinery and unwinds the duplex

parental DNA in cooperation with

the SSB. On the leading strand,

replication occurs continuously in a 5 to 3 direction, whereas on the lagging strand,

DNA replication occurs discontinuously by synthesis and joining of short Okazaki

fragments. In prokaryotes, the leading strand replication apparatus consists of a DNA

polymerase (pol III core), a sliding clamp (beta), and a clamp loader (gamma delta

complex) (Stillman B, 1994; Wijffels G et al., 2011; Berdis AJ, 2008). Debmalya

Barh and Anil Kumar suggested that the DNAPolymerase III subunit Beta can be the

potential novel drug target for which till now no drugs are available. The beta subunit

of DNAPolymerase III is involved in Purine metabolism, Pyrimidine metabolism,

DNA replication, Mismatch repair and Homologous recombination (Figure 3.10)

(Kanehisa M et al., 2010).

The quality of predicted structure were analyzed by PROCHECK and ANOLEA

server. The PROCHECK program assessed using the Ramachandran plot. It is evident

Figure 3.9: Predicted 3D structure of DNAPolymerase III beta subunit


82

from the Ramchandran plot that the predicted model has 99.7%, 0.3% and 0.0%

residues in the most favorable & additionally allowed regions, the generously allowed

regions, and the disallowed regions, respectively. Such a percentage distribution of

the protein residues determined by Ramachandran plot shows that the predicted model

is of good quality (Figure 3.11). The ANOLEA result represents the graphical view of

energy values of each amino acid. It shows that most of the amino acids having

negative energy values (Figure 3.11). The negative energy values (in green) represent

favorable energy environment whereas positive values (in red) unfavorable energy

environment for a given amino acid (Melo F and Feytmans E, 1998).


83

Figure 3.10: Role of DNAPolymerase III beta subunit in different metabolic

pathways


84

Figure 3.11: Procheck and ANOLEA result of DNAPolymerase III beta subunit

UDP-N-acetylmuramoylalanine--D-glutamate ligase (murD) (EC:6.3.2.9) (GI:

2889559 & Accession: NP_801948.1) is modelled based on the template UDP-N-

acetylmuramoylalanine-D-glutamate (MurD) ligase from Streptococcus agalactiae

(PDB ID: 3lk7, Chain: A), which have the sequence identity 72.99% and E-value

0.00e-1. The predicted structure shown in figure 3.12.


85

Figure 3.12: Predicted 3D structure of UDP-N-acetylmuramoylalanine--D-

glutamate ligase (murD)

murD is involved in D-Glutamine and D-glutamate metabolism, and peptidoglycan

biosynthesis (Figure 3.13). MurD catalyzes the formation of the peptide bond between

UDP-MurNAc-L-Ala (UMA) and D-Glu. The reaction starts by phosphorylation of

UMA to form an acylphosphate, followed by nucleophilic attack by the amino group

of the incoming D-Glu. A high-energy tetrahedral intermediate is formed, which

eventually collapses to yield UDP-MurNAc-L-Ala-D-Glu, ADP, and inorganic

phosphate (Bouhss A et al., 2002). High specificity, ubiquity among bacteria, and

absence in mammals make MurD a promising target for antibacterial therapy (El

Zoeiby A et al., 2003).

PROCHECK shows the predicted model has 99.5%, 0.5% and 0.0% residues in the

most favorable & additionally allowed regions, the generously allowed regions, and

the disallowed regions, respectively. Such a percentage distribution of the protein

residues determined by Ramachandran plot shows that the predicted model is of good

quality (Figure 3.14). The ANOLEA result represents the graphical view of energy

values of each amino acid. It shows that most of the amino acids having negative

energy values (Figure 3.14). The negative energy values (in green) represent


86



Figure 3.13: Role of UDP-N-acetylmuramoylalanine--D-glutamate ligase (murD)

in different metabolic pathways


87

Figure 3.14: Procheck and ANOLEA result of UDP-N-acetylmuramoylalanine--

D-glutamate ligase (murD)


88

3-phosphoshikimate 1-carboxyvinyltransferase (aroA) (EC:2.5.1.19) (GI: 28895745

& Accession: NP_802095.1) is modelled based on the template EPSP Synthase from

Streptococcus pneumonia (PDB ID: 1rf4, Chain: C), which have the sequence identity

65.49% and E-value 1.22e-154. The predicted structure is shown in figure 3.15.

The enzyme 5-

enolpyruvylshikimate 3-

phosphate (EPSP)

synthase (EC 2.5.1.19)

is the sixth enzyme on

the shikimate pathway,

which is essential for

the synthesis of

aromatic amino acids

and of almost all other

aromatic compounds in

algae, higher plants,

bacteria, and fungi but

not in mammals. It

generates 5-

enolpyruvylshikimate 3-

phosphate and

orthophosphate from

phosphoenolpyruvate

and shikimate-3-phosphate (Figure 3.16). Because the shikimate pathway is absent in

mammals, EPSP synthase is an attractive target for the development of new

antimicrobial agents effective against bacterial. (Schönbrunn E et al., 2001 and

Pollegioni et al., 2011).

Figure 3.15: Predicted 3D structure of 3-phosphoshikimate 1-carboxyvinyltransferase (aroA)


89

Figure 3.16: Role of 3-phosphoshikimate 1-carboxyvinyltransferase (aroA) in

Phenyl alanine, tyrosine and tryptophan biosynthesis pathways











90

Figure 3.17: Procheck and ANOLEA result of 3-phosphoshikimate 1-

carboxyvinyltransferase (aroA)

Large subunit ribosomal protein L6 (RP-L6) (GI: 28894969 & Accession:

NP_801319.1) is modelled based on the template ribosomal protein L6 from

Geobacillus stearothermophilus (PDB ID: 1rl6, Chain: A), which have the sequence

identity 59.15% and E-value 0.00e-1. The predicted structure is shown in figure 3.18.


91

L6 is a protein from the large (50S) subunit. It is located in the aminoacyl-tRNA

binding site of the peptidyltransferase centre, and is known to bind directly to 23S

rRNA. L6 contains two domains with almost identical folds, suggesting that it was

derived by the duplication of an ancient RNA-binding protein gene. Analysis reveals

several sites on the protein surface where interactions with other ribosome

components may occur, the N terminus being involved in protein-protein interactions

and the C terminus containing possible RNA-binding sites (Golden BL et al., 1993

and Lindahl M et al., 1994) (Figure 3.19).

Figure 3.18: Predicted 3D structure of Large subunit ribosomal protein L6 (RP-L6)


92

Figure 3.19: Location of Large subunit ribosomal protein L6 (RP-L6) and Small

subunit ribosomal protein S17 (RP-S17) in Ribosome











93

Figure 3.20: Procheck and ANOLEA result of Large subunit ribosomal protein

L6


94

The 3D structure of the Small subunit ribosomal protein S17 (RP-S17) (GI: 28894963

& Accession: NP_801313.1) was modeled by ab initio protein modeling tool I-

TASSER as no template is available in PDB. For threading alignments and iterative

structural assembly simulations the top ten proteins used were 3bbnQ, 3u5cA, 1fjgQ,

3bbnQ, 3bbnQ, 3u5cA, 1fjgQ, 1vs7Q, 1vs7Q and 3bbnA. The top model predicted

by I-TASSER was considered for further work based on C-Score (C-score: 1.05)

(Figure 3.21). C-score is a confidence score for estimating the quality of predicted

models by I-TASSER. It is calculated based on the significance of threading template

alignments and the convergence parameters of the structure assembly simulations. C-

score is typically in the range of (-5, 2), where a C-score of higher value signifies a

model with a high confidence and vice-versa.

Figure 3.21: I-TASSAER Predicted 3D structure of Small subunit ribosomal

protein S17 (RP-S17)


95

Ribosomal protein S17 (RPS17) is one of the 22 proteins which belong to the small

subunit of the bacterial ribosome. It binds to the 5' end of 16S RRNA and may

participate in the recognition of termination codons (Figure 3.19).

The quality of predicted structure were analyzed by PROCHECK and ANOLEA

server. The PROCHECK program assessed using the Ramachandran plot. It is evident

from the Ramchandran plot that the predicted model has 98.8%, 1.2% and 0.0%

residues in the most favorable and additionally allowed regions, the allowed regions,

and the disallowed regions, respectively. Such a percentage distribution of the protein


quality (a good quality model would be expected to have over 90% amino acids in the

most favored region) (Figure 3.22). The ANOLEA result represents the graphical

view of energy values of each amino acid. It shows that most of the amino acids

having negative energy values (Figure 3.22). The negative energy values (in green)

represent favorable energy environment whereas positive values (in red) unfavorable

energy environment for a given amino acid.


96

Figure 3.22: Predicted 3D structure quality analysis: Procheck and ANOLEA

result of Small subunit ribosomal protein S17 (RP-S17)


97

3.3.4. Virtual screening and docking

Molecular docking has played key role in the identification of efficient binding of

receptor and ligand. Compounds identified from virtual screening with most favorable

binding energy were considered as hits. From the docking studies, it was found that

from 261,055 molecules only 3370 has the complementary to binding sites with all

selected five targets, furthermore only 2820 were found to have efficient binding

which was again reduced to ~100 from ADME filtration using Qikprop and finally

only few molecules were predicted as non-toxic. The top twenty five hits based on

docking score of MVD were shown in table 3.3 – 3.8 and top two hits of each target

shown in figure 3.23 – 3.32.

Table 3.3: Top twenty five drug-like molecules and its IUPAC name

S.No Pubchem CID

DrugBank/ Zinc ID

IUPAC Name

1 695270 5-[2-(3,5-dimethyl-1H-pyrazol-1-yl)-5-

nitrophenyl]-1H-tetrazole

2 1245041 3-[5-[(1,3-dioxoinden-2-ylidene)methyl]furan-

2-yl]-4-methylbenzoic acid

3 1492413 3-[5-[(1-oxo-[1,3]thiazolo[3,2-a]

benzimidazol-2-ylidene)methyl]furan-2-

yl]benzoic acid

4 1630563 5-hydroxy-N-[3-[(5-hydroxypyridine-3-

carbonyl)amino]propyl]pyridine-3-

carboxamide

5 2058961 2-[1-hydroxy-4-[(4-

methoxyphenyl)sulfonylamino]naphthalen-2-

yl]sulfanylacetic acid

6 2501665 [2-(cyclohexylcarbamoylamino)-2-oxoethyl]5-

(4-nitrophenyl)furan-2-carboxylate

7 3227807 2-[3-[2-(3,4-dihydro-2H-quinolin-1-yl)-2-

oxoethyl]sulfanyl-8-methyl-

[1,2,4]triazino[5,6-b]indol-5-yl]acetic acid


98

8 3241078 2-[[5-[(2,4-dioxo-1H-pyrimidin-6-yl)methyl]-

4-phenyl-1,2,4-triazol-3-yl]sulfanyl]-N-

(oxolan-2-ylmethyl)acetamide

9 4744359 4-[4-(dimethylamino)anilino]-4-oxo-2-

(pyridin-2-ylmethylamino)butanoicacid

10 4965092 2-[(5-amino-1H-1,2,4-triazol-3-yl)sulfanyl]-

N-[3-(azepan-1-ylsulfonyl)phenyl]acetamide

11 5287411 DB03118 (Z)-3-(5-chloro-1H-indol-3-yl)-3-hydroxy-1-

(2H-tetrazol-5-yl)prop-2-en-1-one

12 5479529 DB01112 (6R,7R)-3-(carbamoyloxymethyl)-7-[[(2Z)-2-

(furan-2-yl)-2-methoxyiminoacetyl]amino]-8-

oxo-5-thia-1-azabicyclo[4.2.0]oct-2-ene-2-

carboxylic acid

13 5841017 [2-oxo-2-(oxolan-2-ylmethylamino)ethyl](E)-

3-(3-nitrophenyl)prop-2-enoate

14 11913306 ZINC05438633 2-[2-[[(3S,3aR,6S,6aR)-3-[[4-(furan-2-

yl)pyrimidin-2-yl]amino]-2,3,3a,5,6,6a-

hexahydrofuro[3,2-b]furan-6-yl]amino]-2-

oxoethyl]sulfanylacetate

15 17758816 DB08657 2-[4-[2-[[(5-pyridin-2-ylsulfanyl-1,3-thiazol-

2-yl)carbamoylamino]methyl]-1H-imidazol-5-

yl]phenoxy]acetic acid

16 24984918 4-N-(1H-benzimidazol-2-yl)-5-N-cyclohexyl-

1H-imidazole-4,5-dicarboxamide

17 37887935 ZINC01812785 (2S)-3-acetyl-2-(4-hydroxyphenyl)-5-oxo-1-

(pyridin-3-ylmethyl)-2H-pyrrol-4-olate

18 42506977 ZINC12296678 N-[2-[2-(1H-indol-3-yl)ethylamino]-2-

oxoethyl]-2-(4-oxo-3H-phthalazin-1-

yl)acetamide

19 46936419 DB02540 (2R)-2-[[4-[(1R)-2-(2-amino-4-oxo-1H-

quinazolin-6-yl)-1-

carboxyethyl]benzoyl]amino]pentanedioic

acid


99

20 46936515 DB02876 3-(4-Carbamoyl-1-Carboxy-2-Methylsulfonyl-

Buta-1,3-Dienylamino)-Indolizine-2-

Carboxylic Acid

21 46936525 DB02905 Phosphoric Acid Mono-[3,4-Dihydroxy-5-(5-

Hydroxy-Benzoimidazol-1-Yl)Tetrahydro-

Furan-2-Ylmethyl] Ester

22 46937008 DB04554 [(2S,3R,4R,5R)-5-(6-amino-8-bromopurin-9-

yl)-3,

4-dihydroxyoxolan-2-yl]methyl phosphono

hydrogen phosphate

23 51974545 ZINC40309560 4-[3-(6-phenylmethoxyindol-1-

yl)propanoylamino]butanoate

24 51974879 ZINC40312853 2-(3-(5-(benzyloxy)-1H-indol-1-

yl)propanamido)acetic acid

25 51975027 ZINC40313321 4-[3-(4-methoxyindol-1-

yl)propanoylamino]butanoate


100

Table 3.4: Docking energy of top twenty five drug like molecules

(DNA polymerase III subunit beta (EC:2.7.7.7))

S.No Ligand

PubChem CID

MolDock

Score

Rerank

Score

HBond

1 17758816 -158.683 -127.595 -10.0164

2 11913306 -150.063 -96.3334 -11.1326

3 1492413 -146.001 -121.389 -8.87335

4 46936419 -143.832 -107.928 -11.2557

5 3241078 -143.115 -96.5642 -1.96737

6 24984918 -141.809 -111.563 -3.61914

7 46936515 -139.248 -112.061 -15.396

8 2501665 -138.892 -120.8 -2.40747

9 51974545 -137.959 -113.85 -2.80401

10 46937008 -137.222 -76.3973 -25.0391

11 5479529 -133.949 -102.396 -12.2936

12 51974879 -133.889 -102.876 1.46921

13 4965092 -132.944 -111.112 -5.17048

14 5841017 -130.579 -106.183 -8.52669

15 42506977 -129.596 -104.258 0.998037

16 5287411 -128.788 -101.288 -20.4086

17 1630563 -128.08 -110.391 -12.7826

18 1245041 -126.382 -101.228 -8.84103

19 3227807 -121.381 -88.1272 -2.71382

20 46936525 -119.39 -92.3105 -25.2326

21 2058961 -117.838 -86.8457 -6.47904

22 4744359 -117.229 -101.443 -5.62813

23 51975027 -114.735 -89.3102 -7.53045

24 695270 -110.161 -81.6698 -13.5092

25 37887935 -101.419 -51.7998 -3.87922


101


(UDP-N-acetylmuramoylalanine--D-glutamate ligase (EC:6.3.2.9) (murD))

S.No Ligand

PubChem CID

MolDock

Score

Rerank

Score

HBond

1 17758816 -176.522 -144.34 0.529989

2 11913306 -170.682 -137.641 -4.10009

3 46936419 -162.716 -130.311 -15.7504

4 1492413 -161.719 -135.365 -7.90273

5 3241078 -156.26 -120.608 -8.29214

6 1245041 -155.218 -122.758 -5.45307

7 51974879 -150.103 -105.362 -7.16935

8 42506977 -145.874 -107.007 -6.32493

9 4965092 -145.335 -121.599 -5.52899

10 5841017 -142.544 -119.625 -4.14757

11 46937008 -141.431 -97.148 -20.7909

12 5287411 -139.563 -106.991 -0.1522

13 46936515 -139.459 -99.6212 -10.56

14 2501665 -138.194 -100.568 -6.59717

15 24984918 -136.378 -108.042 -1.36679

16 4744359 -133.396 -81.8205 -4.83618

17 695270 -130.22 -91.3451 0.454714

18 51974545 -129.915 -100.132 -11.4881

19 3227807 -129.827 -77.1968 -14.7884

20 51975027 -128.601 -101.844 -2.12598

21 5479529 -127.007 -102.407 -6.94762

22 46936525 -125.974 -105.133 -4.04828

23 2058961 -125.708 -99.9969 -17.7238

24 1630563 -117.008 -100.22 -3.52732

25 37887935 -114.54 -84.1455 -5.70394


102


(3-phosphoshikimate 1-carboxyvinyltransferase (EC:2.5.1.19) (aroA))

S.No Ligand MolDock

Score

Rerank

Score

HBond

1 46937008 -181.626 -125.807 -5.87785

2 1492413 -180.656 -110.654 -9.92856

3 17758816 -177.462 -141.777 -2.95109

4 3241078 -176.417 -65.4082 -22.4557

5 5479529 -173.822 -118.922 -17.6992

6 3227807 -170.414 -104.495 -10.3813

7 11913306 -168.694 -104.718 -7.42395

8 46936515 -168.671 -126.43 -15.9812

9 4965092 -167.42 -123.119 -22.2637

10 2058961 -165.147 -120.036 -24.4587

11 1245041 -162.221 -107.555 -8.66383

12 46936419 -158.103 -25.2634 -7.18564

13 46936525 -157.503 -115.218 -9.48733

14 51974879 -155.899 -89.9232 -4.224

15 5287411 -155.473 -119.83 -24.6861

16 51974545 -149.885 -107.348 -3.71714

17 51975027 -149.651 -118.366 -16.8457

18 5841017 -148.943 -96.1218 -19.4964

19 42506977 -148.874 -59.8901 -8.96399

20 2501665 -147.568 -90.6061 -8.07132

21 24984918 -146.432 -113.399 -6.0046

22 4744359 -142.046 -90.1123 -9.20127

23 695270 -138.072 -102.935 -27.8707

24 37887935 -129.081 -106.908 -10.0311

25 1630563 -122.468 -100.654 -9.82997


103


(Large subunit ribosomal protein L6 (RP-L6))

S.No Ligand MolDock

Score

Rerank

Score

HBond

1 11913306 -169.098 -90.2586 -5.31978

2 17758816 -164.424 -107.233 -8.17225

3 3241078 -155.579 -118.092 -9.78988

4 2501665 -152.885 -124.814 -6.85472

5 1492413 -151.344 -120.361 -7.93212

6 4965092 -147.715 -107.492 -7.13939

7 42506977 -145.939 -106.882 -5.80688

8 51974879 -142.684 -107.43 -5.05266

9 46937008 -138.476 -86.8045 -13.5967

10 46936419 -138.079 -71.8806 -5.68534

11 1245041 -134.355 -97.0735 -7.71628

12 695270 -131.843 -94.4925 -6.12316

13 5841017 -131.682 -99.0505 -7.86366

14 51974545 -130.568 -75.1982 -3.21314

15 51975027 -127.148 -98.2112 -8.65861

16 2058961 -127.088 -97.539 -13.6713

17 3227807 -126.295 -69.8315 -4.09809

18 5287411 -125.945 -100.723 -6.19616

19 5479529 -125.348 -21.1421 -16.222

20 1630563 -123.983 -95.4908 -7.35914

21 24984918 -123.485 -42.7174 -5.04052

22 46936525 -119.474 -89.0616 -10.6002

23 46936515 -113.626 -50.2908 3.21541

24 4744359 -111.572 -25.6146 -3.61051

25 37887935 -110.835 -80.5465 -4.08915


104


(Small subunit ribosomal protein S17 (RP-S17))

S.No Ligand MolDock

Score

Rerank

Score

HBond

1 17758816 -170.026 -134.349 -9.48101

2 3241078 -149.52 -90.7763 -4.29214

3 42506977 -146.485 -120.441 1.13062

4 11913306 -145.95 -104.29 -2.54592

5 1492413 -144.594 -94.8406 -8.39369

6 46937008 -144.245 -105.239 -10.6164

7 5479529 -141.763 -104.18 -9.46191

8 46936419 -137.124 -85.7152 -10.7449

9 51974879 -130.839 -106.157 -2.58368

10 46936515 -128.622 -50.0735 -7.8294

11 4744359 -126.796 -69.6496 -12.7454

12 24984918 -125.47 -49.121 -7.82456

13 51974545 -125.444 -89.1051 -6.33294

14 2501665 -125.14 -96.5669 -2.47636

15 1245041 -124.118 -97.2086 -7.11609

16 3227807 -123.613 -83.0499 -8.50482

17 2058961 -120.282 -99.7223 -11.1148

18 51975027 -119.164 -94.4125 0.444607

19 4965092 -118.707 -87.0712 -5.21883

20 5841017 -117.824 -82.2337 -2.03173

21 46936525 -117.104 -89.8245 -10.6673

22 5287411 -115.757 -85.0929 -6.74343

23 37887935 -109.431 -62.3435 -1.8665

24 1630563 -107.052 -88.598 -8.07329

25 695270 -105.642 -73.8032 0.56557


105

DNA polymerase III subunit beta (EC:2.7.7.7) 2-[4-[2-[[(5-pyridin-2-ylsulfanyl-1,3-thiazol-2-yl)carbamoylamino]methyl]-1H-imidazol-5-yl]phenoxy]acetic acid PubChem CID: 17758816. Number of Hydrogen Bond (MVD): 6

Figure 3.23: Docking result of CID: 17758816 with DPO3B


106

2-[2-[[(3S,3aR,6S,6aR)-3-[[4-(furan-2-yl)pyrimidin-2-yl]amino]-2,3,3a,5,6,6a-hexahydrofuro[3,2-b]furan-6-yl]amino]-2-oxoethyl]sulfanylacetate PubChem CID: 11913306. Number of Hydrogen Bond (MVD): 6

Figure 3.24: Docking result of CID: 11913306 with DPO3B


107

UDP-N-acetylmuramoylalanine--D-glutamate ligase (EC:6.3.2.9) (murD) 2-[4-[2-[[(5-pyridin-2-ylsulfanyl-1,3-thiazol-2-yl)carbamoylamino]methyl]-1H-imidazol-5-yl]phenoxy]acetic acid PubChem CID: 17758816. Number of Hydrogen Bond (MVD): 8

Figure 3.25: Docking result of CID: 17758816 with murD


108


Figure 3.26: Docking result of CID: 11913306 with murD


109

3-phosphoshikimate 1-carboxyvinyltransferase (EC:2.5.1.19) (aroA)

[(2S,3R,4R,5R)-5-(6-amino-8-bromopurin-9-yl)-3, 4-dihydroxyoxolan-2-yl]methyl phosphono hydrogen phosphate PubChem CID: 46937008. Number of Hydrogen Bond (MVD): 25

Figure 3.27: Docking result of CID: 46937008 with aroA


110

3-[5-[(1-oxo-[1,3]thiazolo[3,2-a] benzimidazol-2-ylidene)methyl]furan-2-yl]benzoic acid. PubChem CID: 1492413. Number of Hydrogen Bond (MVD): 7

Figure 3.28: Docking result of CID: 1492413 with aroA


111

Large subunit ribosomal protein L6 (RP-L6)


Figure 3.29: Docking result of CID: 11913306 with RP-L6


112

2-[4-[2-[[(5-pyridin-2-ylsulfanyl-1,3-thiazol-2-yl)carbamoylamino]methyl]-1H-imidazol-5-yl]phenoxy]acetic acid PubChem CID: 17758816. Number of Hydrogen Bond (MVD): 7

Figure 3.30: Docking result of CID: 17758816 with RP-L6


113

Small subunit ribosomal protein S17 (RP-S17)

2-[4-[2-[[(5-pyridin-2-ylsulfanyl-1,3-thiazol-2-yl)carbamoylamino]methyl]-1H-imidazol-5-yl]phenoxy]acetic acid PubChem CID: 17758816. Number of Hydrogen Bond (MVD): 7

Figure 3.31: Docking result of CID: 17758816 with RP-S17


114

2-[[5-[(2,4-dioxo-1H-pyrimidin-6-yl)methyl]-4-phenyl-1,2,4-triazol-3-yl]sulfanyl]-N-(oxolan-2-ylmethyl)acetamide PubChem CID: 3241078. Number of Hydrogen Bond (MVD): 8

Figure 3.32: Docking result of CID: 3241078 with RP-S17

Current study contributes to identification of twenty five drugs like molecules for

each of the protein targets. Out of twenty five molecules, six molecules are

experimental drugs, one is approved drug, six molecules from ZINC natural product

database and others are from PubChem. The details of final selected molecules

elucidated in the chapter 4.


115

3.4. Conclusion The in silico based approach involves a series of screening of proteins that can be

used as potential drug targets and vaccine candidates. The targets that were found are

inevitable for the growth of the organisms and these proteins neither have a substitute

protein nor an alternative pathway to accomplish the process. The current study

carried out to design a drug-like molecule that can block DNA polymerase III subunit

beta (EC:2.7.7.7), UDP-N-acetylmuramoylalanine--D-glutamate ligase (EC:6.3.2.9)

(murD), 3-phosphoshikimate 1-carboxyvinyltransferase (EC:2.5.1.19) (aroA), Large

subunit ribosomal protein L6 (RP-L6), Small subunit ribosomal protein S17 (RP-

S17). It explores the possibilities of making new drugs from available chemical

molecules. The microorganisms are fast gaining resistance to the existing drugs, so

designing better and effective drugs should be made faster. Thus, the current study

can be the best replacement for current therapies available.

Documents

Chapter 3 Novel drug targets - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/31677/13/12...Chapter 3 Novel drug targets 60 serotonergic 5 – HT2c and the muscarinic M1. Unintentional