36
1 Chapter-1 INTRODUCTION 1.1 INTRODUCTION In the post-genomic era the field of Bioinformatics is growing at a phenomenal rate and the task in the early years of the millennium is to demonstrate how in-silico simulations [1] facilitate experiments in the laboratories and how this knowledge can be applied in curing human diseases. To achieve this goal is to translate the genomic data into biological knowledge by studying the function of all known proteins. Knowledge of the structure is not sufficient for deciphering the functional mechanisms, which often depend on the flexibility of protein structures and the native state of a protein. Deviating to some extent from the average coordinates reported as the experimental structure, the flow of Genetic Information from sequence to structure to function lies at the core of all medical and biological sciences. This is referred to as the Central Dogma of Molecular Biology [2]. The process of Transcription and Translation results DNA Sequence to a sequence of amino acids. These amino acids fold to form protein structures, which perform specific functions that together define the phenotype of the organism. Both, the computational and experimental approaches play a critical role in the identification of the function of proteins. Experimental observation of conformational motion of bio-molecules is becoming possible; NMR spectroscopy can be used to determine both the dynamics and structure of proteins. Computational approaches expand the structural knowledge when applied to potentially large number of families of related proteins and thus help fill the gap between the number of known protein sequences and the known structures.

Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

1

Chapter-1

INTRODUCTION

1.1 INTRODUCTION

In the post-genomic era the field of Bioinformatics is growing at a

phenomenal rate and the task in the early years of the millennium is to demonstrate

how in-silico simulations [1] facilitate experiments in the laboratories and how this

knowledge can be applied in curing human diseases. To achieve this goal is to

translate the genomic data into biological knowledge by studying the function of all

known proteins. Knowledge of the structure is not sufficient for deciphering the

functional mechanisms, which often depend on the flexibility of protein structures and

the native state of a protein. Deviating to some extent from the average coordinates

reported as the experimental structure, the flow of Genetic Information from sequence

to structure to function lies at the core of all medical and biological sciences. This is

referred to as the Central Dogma of Molecular Biology [2]. The process of

Transcription and Translation results DNA Sequence to a sequence of amino acids.

These amino acids fold to form protein structures, which perform specific functions

that together define the phenotype of the organism. Both, the computational and

experimental approaches play a critical role in the identification of the function of

proteins. Experimental observation of conformational motion of bio-molecules is

becoming possible; NMR spectroscopy can be used to determine both the dynamics

and structure of proteins. Computational approaches expand the structural knowledge

when applied to potentially large number of families of related proteins and thus help

fill the gap between the number of known protein sequences and the known

structures.

Page 2: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

2

Bioinformatics offers numerous approaches for the prediction of structure and

function of proteins on the basis of sequence and structural similarities [3]. The

protein sequence to structure and function relationship is well established and reveals

that the structural details at atomic level help to understand the molecular function of

proteins. The protein structure is conserved and can accommodate up to 80% of

sequence variation. X-ray crystallography and NMR are the experimental methods to

determine protein structure. They provide 3D structure data for a relatively small

number of proteins due to time-consuming preprocessing requirements such as

purification, crystallization of proteins etc. NMR is limited by the size of the protein

molecule. Predicting a protein's structure from its amino acid sequence is a problem

of great scientific importance. As the major genome projects progress, and as protein

sequencing methods continue to out-pace structure determination methods, it has

become a problem of great practical importance. While this problem is not yet solved,

we have a partial solution. If two protein sequences are at least 30% identical, they

tend to have similar structures. Sensitive methods such as Hidden Markov Models

[4](HMMs) can successfully predict structural similarity when the sequence similarity

is less pronounced. The power of this partial solution should not be underestimated.

By genome sequencing projects, vast amount of data was generated and this

led to the development of computational techniques for the analysis of DNA and

protein sequence data. These focused primarily on the flow of information from

sequence to structure and had much success in predicting function directly from

sequence. The advantages of applying computational solutions to problems in this

area have led to the evolution of the field of Bioinformatics as a cornerstone of

virtually all biological and medical sciences. Bimolecular interactions are all

biochemical pathways that together constitute the process of life. This dissertation

Page 3: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

3

contributes to this emerging discipline by addressing specific problems associated

with the study of protein structures and their functional mechanisms. We present

computational models for the comparison of protein structures, the classification of

proteins into structural families and the analysis of ligand interactions with proteins.

There is a need to design drugs for diseases which are being caused by

proteins whose structures are unknown, here we do the prediction of those proteins by

classifying them using known templates [5][6]. Every protein has some sequences of

patterns which will be matched using data mining algorithm and classified using

sequences of those proteins.

1.2 CANCER

Cancer [7][8] is a generic term for a large group of diseases that can affect any

part of the body. It is inevitably one of the most studied yet unsolved non

communicable human diseases. It is an idiopathic disease [9] and doctors and

scientists are constantly trying to evolve new effective drugs for its treatment. There is

no other disease which parallels cancer in diversity of its origin, nature and

treatments. It is deregulated multiplication of cells with the consequence of an

abnormal increase of the cell number in particular organs or tissues. Other terms used

are malignant tumours and neoplasms [10]. One defining feature of cancer is the rapid

creation of abnormal cells that grow beyond their usual boundaries and which can

then invade adjoining parts of the body and spread to other organs. This process is

referred to as metastasis [11][12]. Metastases are the major cause of death from

cancer [13]. Our bodies are made up of billions of cells that grow, divide, and then die

in a predictable manner. Cancer occurs when something goes wrong with the cell

cycle causing uncontrolled cell division and growth. Cancer cells lump together and

form a mass of extra tissue, also known as a tumor, which continues to grow. As it

Page 4: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

4

grows, it may damage and invade nearby tissue. If a cancerous tumor outgrows its

birthplace called the primary cancer site and moves on to another place called the

secondary cancer site it's referred to as metastasizing.

Initial stages of the developing cancer are generally confined to the organ of

origin whereas advanced cancers grow beyond the tissue of origin. Advanced cancers

invade the surrounding tissues that are initially connected to the primary cancer. Then

these are distributed via the hematopoietic and lymphatic systems throughout the

body where they can colonize in distant tissues and cause metastasis. The

development of cancers is thought to result from the mutation or damage of the

cellular genome, either due to environmental influences or caused by random

endogenous mechanisms In the attempt to establish how, why and when cancer

occurs, a plethora of genetic pathways and regulatory circuits have been discovered

that are necessary to maintain general cellular functions such as proliferation,

migration and differentiation. Undesirable changes or alterations in this fine-tuned

network of cascades and interactions, due to endogenous failure or to exogenous

challenges by environmental factors, may disable any member of such regulatory

pathways. This could induce the death of the affected cell, may immediately provide

it with a growth advantage within a particular tissue or may mark it for cancerous

development. Since these natural mechanisms always rectify such environmentally

induced abnormalities, apoptosis or programmed cell death is the best example which

can control proliferation in cancer diseases.

1.2.1 Global Burden of Cancer

Cancer is a leading cause of worldwide death. The disease accounted for 7.4

million deaths or around 13% of all deaths worldwide in 2004. The main types of

cancer leading to overall cancer mortality each year are:

Page 5: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

5

Lung (1.3 Million Deaths/Year)

Stomach (803 000 Deaths)

Colorectal (639 000 Deaths)

Liver (610 000 Deaths)

Breast (519 000 deaths).

More than 70% of all cancer deaths occurred in low- and middle-income

countries. Deaths from cancer worldwide are projected to continue rising, with an

estimated 12 million deaths in 2030.

The most frequent types of cancer worldwide (in order of the number of global

deaths) are:

Among men - lung, stomach, liver, colorectal, esophagus and prostate

Among women - breast, lung, stomach, colorectal and cervical.

There are over 100 different types of cancer, and each is classified by the type

of cell that is initially affected. Tumors can grow and interfere with the digestive,

nervous, and circulatory systems and they can release hormones that alter body

function. Tumors that stay in one spot and demonstrate limited growth are generally

considered to be benign.

More dangerous, or malignant, tumors form when two things occur:

A cancerous cell manages to move throughout the body using the blood or

lymph systems, destroying healthy tissue in a process called invasion in which

the cell manages to divide and grow, making new blood vessels to feed itself

in a process called angiogenesis.

When a tumour successfully spreads to other parts of the body and grows,

invading and destroying other healthy tissues, it is said to have metastasized.

Page 6: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

6

1.2.2 Causes of Cancer

Cancer is ultimately the result of cells that uncontrollably grow and do not die.

Normal cells in the body follow an orderly path of growth, division, and death.

Cancer results due to anomalies in programmed cell death, apoptosis[14]. Unlike

regular cells, cancer cells do not experience programmatic death and instead continue

to grow and divide. This leads to a mass of abnormal cells that grows out of control.

Although most cancers develop and spread via an organ - blood cancers like leukemia

do not. They affect the blood and the organs that form blood and then invade nearby

tissues.

1.2.3 Symptoms of Cancer

1.2.3.1 A broad spectrum of non-specific cancer symptoms may include:

Persistent Fatigue: Fatigue is one of the most commonly experienced cancer

symptoms. It is usually more common when the cancer is advanced, but still occurs in

the early stages of some cancers. Fatigue is a symptom of both malignant and non-

malignant conditions Anemia is a condition that is associated with many types of

cancer, especially types affecting the bowel.

Unintentional Weight Loss: Losing 10 pounds or more unintentionally definitely

warrants a visit to the doctor. This type of weight loss can occur with or without loss

of appetite. Weight loss can be a symptom of cancer, but is also a symptom of many

other illnesses, too.

Pain: Pain is not an early symptom of cancer, except in some cancer types like those

that spread to the bone. Pain generally occurs when cancer spreads and begins to

affect other organs and nerves. Lower back pain is a cancer symptom that is

associated with ovarian cancer and colon cancer. Shoulder pain can also be a

symptom of lung cancer. Pain in the form of headaches can be associated with brain

Page 7: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

7

tumors. Stomach pains can be related to types of cancer, like stomach cancer,

pancreatic cancer, and many others.

Fever: A fever is a very non-specific symptom of many mild to severe conditions,

including cancer. Fevers are commonly associated with types of cancer like leukemia

and lymphoma that affects the blood.

Bowel Changes: Constipation, diarrhea, blood in the stools, gas, thinner stools are

most commonly associated with colon cancer, but are also related to other cancer

types.

Chronic Cough: In relation to cancer, a chronic cough with blood or mucus can be a

symptom of lung cancer.

1.2.4 General Classification of Cancer

There are five broad groups that are used to classify cancer [15][16].

Carcinomas are characterized by cells that cover internal and external parts of

the body such as lung, breast, and colon cancer.

Sarcomas are characterized by cells that are located in bone, cartilage, fat,

connective tissue, muscle, and other supportive tissues.

Lymphomas are cancers that begin in the lymph nodes and immune system

tissues.

Leukaemias are cancers that begin in the bone marrow and often accumulate

in the bloodstream.

Adenomas are cancers that arise in the thyroid, the pituitary gland, the adrenal

gland, and other glandular tissues.

Page 8: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

8

1.2.5 Diagnosis of Cancer

Early detection of cancer can greatly improve the odds of successful treatment

and survival imaging techniques such as X-rays, CT scans, MRI scans, PET

scans, and ultrasound scans are used regularly in order to detect where a tumor

is located and what organs may be affected by it [17] [18].

Extracting cancer cells and looking at them under a microscope is the only

absolute way to diagnose cancer. This procedure is called a biopsy. Analysis

of proteins polysaccharides etc. released by the cancerous cells can also serve

an index by performing blood tests. Prostate cells release a higher level of a

chemical called PSA (prostate-specific antigen into the bloodstream).

1.2.6 Staging

The most common cancer staging method is called the TNM system. T (1-4)

indicates the size and direct extent of the primary tumor, N (0-3) indicates the degree

to which the cancer has spread to nearby lymph nodes, and M (0-1) indicates whether

the cancer has metastasized to other organs in the body. TNM descriptions then lead

to a simpler categorization of stages, from 0 to 4, where lower numbers indicate that

the intensity is less [19]. While most Stage 1 tumors are curable, most Stage 4 tumors

are inoperable or untreatable.

1.2.7 Treatment

Cancer treatment depends on the type of cancer, the stage of the cancer, age,

health status, and additional personal characteristics [20]. There is no single treatment

for cancer, and patients often receive a combination of therapies and palliative care.

Treatments usually fall into one of the following categories: surgery, radiation,

chemotherapy, immunotherapy, hormone therapy, or gene therapy.

Page 9: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

9

1.2.7.1 Surgery: Surgery is the oldest known treatment for cancer [21]. If a cancer

has not metastasized, it is possible to completely cure a patient by surgically removing

the cancer from the body. This is often seen in the removal of the prostate or a breast

or testicle. After the disease has spread, however, it is nearly impossible to remove all

of the cancer cells.

1.2.7.2 Radiation: Radiation treatment, also known as radiotherapy, destroys cancer

by focusing high-energy rays on the cancer cells. Radiotherapy [22][23] utilizes high-

energy gamma-rays that are emitted from metals such as radium or high-energy x-rays

that are created in a special machine. Early radiation treatments caused severe side-

effects because the energy beams would damage normal, healthy tissue. But the

technologies have improved so that beams can be more accurately targeted.

Radiotherapy is used as a stand-alone treatment to shrink a tumor or destroy cancer

cells (including those associated with leukemia and lymphoma), and it is also used in

combination with other cancer treatments.

1.2.7.3 Chemotherapy: Chemotherapy utilizes chemicals that interfere with the cell

division process - damaging proteins or DNA - so that cancer cells will commit

suicide. These treatments target any rapidly dividing cells (not necessarily just cancer

cells), but normal cells usually can recover from any chemical-induced damage while

cancer cells cannot. Chemotherapy [24] is generally used to treat cancer that has

spread or metastasized because the medicines travel throughout the entire body. It is a

necessary treatment for some forms of leukemia and lymphoma. Chemotherapy

treatment occurs in cycles so the body has time to heal between doses. However, there

are still common side effects such as hair loss, nausea, fatigue, and vomiting.

Combination therapies often include multiple types of chemotherapy or chemotherapy

combined with other treatment options.

Page 10: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

10

1.2.7.4 Immunotherapy: Immunotherapy [25] aims to get the body's immune system

to fight the tumor. Local immunotherapy injects a treatment into an affected area, for

example, to cause inflammation that causes a tumor to shrink. Systemic

immunotherapy treats the whole body by administering an agent such as the protein

interferon alpha that can shrink tumors cancer cells. These therapies are relatively

young, but researchers have had success with treatments that introduce antibodies to

the body that inhibit the growth of breast cancer cells. Bone marrow transplantation or

hematopoietic stem cell transplantation can also be considered immunotherapy

because the donor's immune cells will often attack the tumor or cancer cells that are

present in the host.

1.2.7.5 Hormone therapy: Several cancers have been linked to some types of

hormones, most notably breast and prostate cancer. Hormone therapy is designed to

alter hormone production in the body so that cancer cells stop growing or are killed

completely. Breast cancer hormone therapies often focus on reducing estrogen levels

(a common drug for this is tamoxifen) and prostate cancer hormone therapies often

focus on reducing testosterone levels. In addition, some leukemia and lymphoma

cases can be treated with the hormone cortisone.

1.2.7.6 Gene therapy: The goal of gene therapy is to replace damaged genes with

ones that work to address a root cause of cancer: damage to DNA. For example,

researchers are trying to replace the damaged gene that signals cells to stop dividing

(the p53 gene) with a copy of a working gene. Other gene-based therapies [26] focus

on further damaging cancer cell DNA to the point where the cell commits suicide.

1.2.8 Prevention

Cancers that are closely linked to certain behaviors are the easiest to prevent

[27][28]. For example, choosing not to smoke tobacco[29] or drink alcohol

Page 11: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

11

significantly lower the risk of several types of cancer - most notably lung, throat,

mouth, and liver cancer. Skin cancer can be prevented by staying in the shade,

protecting yourself with a hat and shirt when in the sun, and using sunscreen. Diet is

also an important part of cancer prevention [30] since what we eat has been linked to

the disease.

Certain vaccinations have been associated with the prevention of some

cancers. For example, many women receive a vaccination for the human

papillomavirus because of the virus's relationship with cervical cancer [31]. Hepatitis

B vaccines prevent the hepatitis B virus, which can cause liver cancer.

1.2.9 Cancer Cells in Culture

Normal cells pass through a limited number of cell divisions before they

decline in vigour and die. This is called replicative senescence. It may be caused by

their inability to synthesize telomerase. Cancer cells may be immortal; that is,

proliferate indefinitely in culture. Cancer cells in culture produce telomerase [32]

Normal cells: when placed on a tissue culture dish, they proliferate until the

surface of the dish is covered by a single layer of cells just touching each other. Then

mitosis ceases. This phenomenon is called contact inhibition. Cancer cells show no

contact inhibition. Once the surface of the dish is covered, the cells continue to divide,

piling up into mounds .The cells below are said to be transformed. Radiation, certain

chemicals, and certain viruses are capable of transforming cells.

Page 12: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

12

Figure 1. 1: Image of HeLa cells [33]

Cancer cells almost always have an abnormal karyotype [34] with

o abnormal numbers of chromosomes (polyploidy or aneuploidy)

o chromosomes with abnormal structure:

translocations

deletions

duplications

Inversions

Page 13: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

13

1.3 GLUTATHIONE S- TRANSFERASE

Glutathione S- transferases [35] constitute a large family of enzymes which

catalyze the addition of glutathione to endogenous or xenobiotic, often toxic

electrophilic chemicals, and major group of detoxification enzymes. Eukaryotic

glutathione S-transferase usually mediate the inactivation, degradation or excretion

of a wide range of compounds by formation of the corresponding glutathione

conjugates. Different classes of GSTs have been defined as α, μ, π, θ, σ, ξ and ω in

mammals, φ and τ in plants, δ in insects and β in bacteria [36], based mainly on

sequence analysis.

All eukaryotic species possess multiple cytosolic and membrane-bound GST

isoenzymes, each of which displays distinct catalytic as well as noncatalytic binding

properties: the cytosolic enzymes are encoded by at least five distantly related gene

families [37] designated class alpha, mu, pi, sigma, and theta GST, whereas the

membrane-bound enzymes, microsomal GST and leukotriene C4 synthetase, are

encoded by single genes and both have arisen separately from the soluble GST.

Evidence suggests that the level of expression of GST is a crucial factor in

determining the sensitivity of cells to a broad spectrum of toxic chemicals [38]. The

biochemical functions of GST are described to show how individual isoenzymes

contribute to resistance to carcinogens, antitumor drugs, environmental pollutants,

polycyclic hydrocarbons [39][40] and products of oxidative stress. A description of

the mechanisms of transcriptional and posttranscriptional regulation of GST

isoenzymes is provided to allow identification of factors that may modulate resistance

to specific noxious chemicals.

Page 14: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

14

1.3.1 Enzyme Classification

EC no: 2.5.1.18

Accepted name: glutathione transferase

Other name(s): glutathione S-transferase; glutathione S-alkyltransferase;

glutathione S-aryltransferase; S-(hydroxyalkyl)glutathione lyase; glutathione S-

aralkyltransferase; glutathione S-alkyl transferase; GST

Systematic name: RX: glutathione R-transferase

1.3.2 Reaction Catalyzed

Figure 1. 2: Reaction of RX + glutathione = HX + R-S-glutathione

Page 15: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

15

Comments: A group of enzymes of broad specificity. R may be an aliphatic, aromatic

or heterocyclic group; X may be a sulfate, nitrile or halide group. Also catalyses the

addition of aliphatic epoxides and arene oxides to glutathione, the reduction of polyol

nitrate by glutathione to polyol and nitrile, certain isomerization reactions and

disulfide interchange.

1.3.3 Path Ways Involved

Table 1. 1: Glutathione S transferase involved Pathways

Glutathione S transferase is very important enzyme that participates in three important

pathways that are as follows.

Figure 1. 3: Drug Metabolism- Cytochrome P-450 [41]

S.NO PATH WAYS KEGG LINK

1 Drug Metabolism Cytochrome-P450 00982

2 Glutathioine Metabolism 00480

3 Xenobiotic Metabolism by Cytochrome P450 00980

Page 16: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

16

1.3.4 Glutathione Metabolism:

Figure 1.4: Xenobiotic Metabolism by Cytochrome P450

1.3.5 Risk of Cancer with GST

Polymorphisms in glutathione-S- transferase genes (GST-M1, GST-T1 and

GST-P1) are susceptibility for prostate cancer among male smokers [42].

Polymorphisms of Genotypes GSTM1, GSTT1, and GSTP1 in Glutathione S-

transferase are susceptibility to Risk and Survival of Pancreatic Cancer [43].

Numerous studies have associated bladder cancer with exposure to

carcinogens present in tobacco smoke and other environmental or occupational

exposures. Approximately 50% of all humans inherit two deleted copies of the

GSTM1 gene which encodes for the carcinogen [44] detoxification

Page 17: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

17

enzyme glutathione S-transferase M1. Recent findings suggest that the GSTM1

gene may modulate the internal dose of environmental carcinogens

and thereby affect

the risk of developing bladder cancer.

Gene Regulation Expression: A region in the 5' flanking sequence of the glutathione

S-transferase (RX:glutathione R-transferase, EC 2.5.1.18) Ya subunit gene [45] that

contains a unique xenobiotic-responsive element (XRE). The regulatory region spans

nucleotides -722 to -682 of the 5' flanking sequence and is responsible for part of the

basal level as well as inducible expression of the Ya subunit gene by planar aromatic

compounds such as beta-naphthoflavone (beta-NF) and 3-methyl-cholanthrene. The

DNA sequence of this region (beta-NF-responsive element) is distinct from the DNA

sequence of the XRE found in the cytochrome P-450 IA1 gene. In addition to the

region containing the beta-NF-responsive element, two other regulatory regions of the

Ya subunit gene was identified.

One region spans nucleotides -867 to -857 and has a DNA sequence which is

identical to the hepatocyte nuclear factor 1 recognition motif found in several liver-

specific genes. The second region spans nucleotides -908 to -899 and contains a DNA

sequence with identity to the XRE found in the cytochrome P-450 IA1 gene. The

XRE sequence also contributes to part of the responsiveness of the Ya subunit gene to

planar aromatic compounds. This suggests that regulation of gene expression by

planar aromatic compounds can be mediated by a DNA sequence that is distinct from

the XRE sequence.

1.4 SEQUENCE ANALYSIS

Early in the days of protein and gene sequence analysis [46][47], it was

discovered that the sequences from related proteins or genes were similar, in the sense

Page 18: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

18

that one could align the sequences so that many corresponding residues match. This

discovery was very important, since strong similarity between two genes is a strong

argument for their homology. Bioinformatics is based on it. The basis for comparison

of proteins and genes using the similarity of their sequences is that the proteins or

genes are related by evolution; they have a common ancestor. Random mutations in

the sequences accumulate over time, so that proteins or genes that have a common

ancestor far back in time are not as similar as proteins or genes that diverged from

each other more recently. Analysis of evolutionary relationships between protein or

gene sequences depends critically on sequence alignments.

There are many features of sequence alignments that give interesting

information. For example, a closer analysis of the alignment can reveal which parts of

the sequences are likely to be important for the function, if the proteins are involved

in similar processes. In parts of the sequence of a protein which are not very critical

for its function, the random mutations can accumulate more easily. In parts of the

sequence that are critical for the function of the protein, hardly any mutations will be

accepted; nearly all changes in such regions will destroy the function.

Sequence alignment has become an essential part of biological science [48].

There are now many different techniques and implementations of methods to perform

alignment of sequences.

1.4.1 The Evolutionary Basis of Sequence Alignment

One goal of sequence alignment is to enable the researcher to determine

whether two sequences display sufficient similarity such that an inference of

homology is justified [49][50]. Although these two terms are often interchanged in

popular usage, let us distinguish them to avoid confusion in the current discussion.

Similarity is an observable quantity that might be expressed as, say, percent identity

Page 19: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

19

or some other suitable measure. Homology, on the other hand, refers to a conclusion

drawn from these data that two genes share a common evolutionary history [51].

Genes either are or are not homologous—there are no degrees for homology [52] as

there are for similarity.

1.4.2 Goals of Sequence Analysis

1. Identify the genes

2. Determine the function of each gene. One way to hypothesize the function is to find

another gene possibly from another organism whose function is known and to which

the new gene has high sequence similarity. This assumes that sequence similarity

implies functional similarity, which may or may not be true.

3. Identify the proteins involved in the regulation of gene expression.

4. Identify sequence repeats.

5. Identify other functional regions, for example origins of replication (sites at which

DNA polymerase binds and begins replication; For example, pseudogenes (sequences

that look like genes but are not expressed), sequences responsible for the compact

folding of DNA, and sequences responsible for nuclear anchoring of the DNA.

Many of these tasks are computational in nature. Given the incredible rate at

which sequence data is being produced, the integration of computer science,

mathematics, and biology will be integral to analyzing those sequences.

1.4.3 Sequence Similarity

In molecular biology, a common question is to ask whether or not two

sequences are related. The most common way to tell whether or not they are related

is to compare them to one another to see if they are similar. If we look at two words

in the English language, we note that two words that are spelled similarly may mean

two completely different things, such as the words pear and tear.

Page 20: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

20

Biological sequences that are similar but not exact provide useful information

to help discover functional, structural, and evolutionary information. Two sequences

in different organisms are homologous if they have been derived from a common

ancestor sequence [53]. Two sequences may or may not be homologous regardless of

their sequence similarity. However, the greater the sequence similarity, the greater

chance there is that they share similar function and or structure.

1.4.3.1 Comparision of sequence analysis techniques

There are different sequence analysis techniques, among which my algorithm

SSHC (Sequnce search hill climbing algorithm) is compared with BLAST and

FASTA techniques.

Table 1. 2: Comparision table of different techniques

By observing all the three approaches our proposed system is performing

constantly. Our proposed system performs sequence analysis efficiently various

BLAST analysis using five

different matrices

FASTA analysis using five different

matrices

SSHC Analysis Using five different

Matrices

Matri

x

%Ide

ntity

%Posi

tives

Sco

res

Matrix %iden-

Tity

%simila

-rity

Score

s

Matrix %iden-

tity

%Posi-

tives

Scores

PAM

30

66 80 306 BL 50 66.2 86.8 230 PAM

30 73 76.5

268

PAM

70

66 80 304 BL 62 66.2 86.3 276.3 PAM

70 73 76.25

290.15

BL 80 66 80 311 BL 80 66.2 83.1 346.9 BL 80 73 74.65

328.95

BL 62 66 80 300 PAM

120

66.2 86.8 327.4 BL 62

73 76.5

313.7

BL 45 66 80 313 PAM

250

66.2 90.4 153.7 BL 45

73 78.3

233.35

Page 21: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

21

aspects. SSHC searches for each amino acids individually as well as no gaps. When

we compare SSHC with the other systems such as BLAST and FASTA it gives

neighbor sequence from the database. So,we conclude that our system is better than

other systems.

Our proposed system Sequence Analysis Steepest-Ascent Hill Climbing

Technique follows a strategy that searches for the nearest peak point and upon

climbing it searches for the destination. If not it searches for next peak point. SSHC

Continues this process till it finds the best solution.

1.4.3.2 Advantages of my algorithm

My algorithm SSHC follows Heuristic search.

Heuristic search is an excellent Artificial Intelligence(AI) technique for

solving several problems.

This algorithm searches very accurately and efficiently.

SSHC occupies very less memory compare to direct search

Regular search techniques are used in priority of operations.

SSHC make use of domain specific information.

It tells us which is the best solution nearest to us.

SSHC yields the search space into smaller so that search can be done faster

way.

SSHC searches for best fit to the solution.

1.4.4 Review Of Literature

Comparative modeling of protein or Homology Modeling refers to

constructing an atomic-resolution model of the target protein from its amino acid

sequence and an experimental three-dimensional structure of a related homologous

protein [54]. Homology modeling can produce high-quality structural models when

Page 22: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

22

the target and template are closely related, which has inspired the formation of

a structural genomics consortium dedicated to the production of representative

experimental structures for all classes of protein folds. Mainly, it relies on the

identification of one or more known protein structures likely to resemble the structure

of the query sequence and on the production of an alignment that maps residues in the

query sequence to residues in the template sequence. The chief inaccuracies in

homology modeling, which degenerate with lower sequence identity, derive from

errors in the initial sequence alignment and from improper template selection. The

sequence alignment and template structure are then used to produce a structural model

of the target. Because protein structures are more conserved than DNA sequences,

detectable levels of sequence similarity usually imply considerable structural

similarity.

Like other methods of structure prediction, current practice in homology

modeling is assessed in a biannual large-scale experiment known as the Critical

Assessment of Techniques for Protein Structure Prediction, or CASP. The quality of

the homology model is dependent on the quality of the sequence alignment and

template structure. The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target

but not in the template, and by structure gaps in the template that arise from poor

resolution in the experimental procedure (usually X-ray crystallography) used to solve

the structure[55].

Model quality declines with decreasing sequence identity; a typical model has

~1-2 Å root mean square deviation between the matched Cα atoms at 70% sequence

identity but only 2-4 Å agreement at 25% sequence identity. However, the errors are

Page 23: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

23

significantly higher in the loop regions, where the amino acid sequences of the target

and template proteins may be completely different.

1.5 STUDY ON GLUTATHIONE S-TRANSFERASE

Glutathione S-transferase (GST)[56] are an important class of phase II

detoxifying enzymes, catalyzing the conjugation of glutathione (GSH) to electrophilic

species. Recently, a number of cytosolic GSTs were crystallized. Rat Mu class

Glutathione S transferase 4-4 homology model has been done by using molecular

modeling techniques. This modeling study was done based upon the crystal structure

of rat GST 3-3, both members of the mu class. GST 3-3 and GST 4-4 isoenzymes

share a sequence homology of 88%.

The critical role of the glutathione S-transferase (GST) multigene family in

cellular protection in combination with the large interindividual variability in the

expression of these enzymes has prompted an investigation of their importance in

cancer prevention and susceptibility. Previous preclinical and clinical studies in

laboratories have established an association between decreased GST activity and

increased risk of colorectal cancer. Based on the increased incidence of colon

malignancies among patients with ulcerative colitis, GST activity has been examined

in a mouse model of induced colitis. Significant decreases (50% of controls) in the

GST activity of colon tissue were observed during the establishment and progression

of colitis.

These data suggested that the depletion of cellular protection may be an

important event in the carcinogenic progression of ulcerative colitis. The ability of the

dithiolthione oltipraz to induce GST expression within the murine colon has been

demonstrated. Use of chemopreventive regimens to induce phase 2 detoxication

enzyme expression represents a promising strategy for the prevention of cancer.

Page 24: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

24

Clinical studies revealed that the GST activity of blood lymphocytes from individuals

with either a personal or family history of colorectal cancer or a personal history of

colon polyps was decreased significantly when compared to that of healthy controls.

Phase 1 clinical evaluation of oltipraz has demonstrated its ability to induce GST

activity as well as the level of transcripts encoding γ-glutamylcysteine synthetase (γ-

GCS) and DT-diaphorase in the colon mucosa of individuals at increased risk for

colorectal cancer. The observed correlation between the post treatment response in

blood lymphocytes and colon mucosa suggested that blood lymphocytes may be used

in future trials as a surrogate biomarker of the responsiveness of colon tissue to

chemopreventive regimens.

However, modeling study of Wuchereria bancrofti Glutathione S

transferase[57]has been done for comparision with brugia malayi. In this study

sequence analysis is done by PSI-Blast that results Bm GST having 42% and 41%

sequence identity (highest homology compared to other known structures) with Sus

scrofa, respectively. Therefore, they used the structure of Sus scrofa GST (PDB:

2GSR) as the template for building homology models for Wb GST and Bm GST using

MOE (molecular operating environment), an automated molecular modeling tool. In

this study models were analyzed by the WHATIF.

Human theta-class glutathione transferase T1-1 was modeled by a manual

threading approach[58]. This study is based on the coordinates of related Theta class

T2-2.The low level sequence identity (about 20%), found in the C-terminal extension.

The C terminal extension contributes the active site of this molecule. Manual docking

of known substrates and non-substrates has implicated potential candidates for the T1-

1 catalytic residues involved in the dehalogenation and epoxide-ring opening

Page 25: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

25

activities. This docking study was used to find Glutathione S transferase T1 activity

with different substrates between species.

One more homology modeling study was reported on Rat Mu Class

Glutathione S-transferase and predicts the difference in stereo selectivity between

GST 3-3 and GST 4-4 for the R- and S-configured carbons of the oxirane moiety with

the docking studies [59]. The protein homology model, together with the substrate

model, will be useful to further rationalize the substrate selectivity of GST 4-4, and to

identify new potential GST 4-4 substrates. Human class Mu glutathione S transferase

was modeled by homology modeling. This generates the 3-D structure of three GST

human isozymes of the Mu class, Mlb-1b, M2-2 and M3-3, using the Rat3-3

GST structure as a template. The high percentage of identity among

these enzymes

and the lack of insertions and deletions make the system ideally suited to the

technique of homology modeling.

Over 500 studies have examined the association of genetic variants of

glutathione S-transferase with various malignancies yielding inconsistent results [60].

The genotyping was based on PCR assays that identified the GSTM1 and GSTT1 null

(-/-) genotypes but did not distinguish homozygous wild-type +/+ and heterozygous

+/- individuals. Complete GSTM1 and GSTT1 genotyping can be accomplished by

recently developed assays that allow the definition of +/+, +/-, and -/- genotypes by

separate identification of the respective GSTM1 and GSTT1 wild-type and null

alleles.

Application of the new GSTM1 assay to a breast cancer case-control study

revealed that the relative risk of breast cancer for the +/+ genotype compared to the -/-

genotype was 2.83 (95% confidence interval 1.45-5.59; P=0.002), suggesting a

protective effect of the GSTM1 deletion. Regardless of the explanation for the

Page 26: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

26

association between the +/+ genotype and increased breast cancer risk, these results

warrant application of true GSTM1 and GSTT1 genotyping to additional or

previously analyzed groups with breast cancer or other malignancies.

Dietary intake of cruciferous vegetables (Brassica spp.) has been inversely

related to colorectal cancer risk[61], and this has been attributed to their high content

of glucosinolate degradation products such as isothiocyanates (ITCs). These

compounds act as anticarcinogens by inducing phase II conjugating enzymes, in

particular glutathione S-transferase (GSTs). These enzymes also metabolize ITCs,

such that the protective effect of cruciferous vegetables may predicate on GST

genotype. The Singapore Chinese Health Study is a prospective investigation among

63 257 middle-aged men and women, who were enrolled between April 1993 and

December 1998.

In this nested case-control analysis, they compared 213 incident cases of

colorectal cancer with 1194 controls. Information on dietary ITC intake from

cruciferous vegetables, collected at recruitment via a semi-quantitative food frequency

questionnaire, was combined with GSTM1, T1 and P1 genotype from peripheral

blood lymphocytes or buccal mucosa. When categorized into high and low intake,

dietary ITC was slightly lower in cases than controls but the difference was not

significant.

There were no overall associations between GSTM1, T1 or P1 genotypes and

colorectal cancer risk. However, among individuals with both GSTM1 and T1 null

genotypes, we observed a 57% reduction in risk among high versus low consumers of

ITC (OR 0.43, 95% CI 0.20–0.96), in particular for colon cancer (OR 0.31, 0.12–

0.84), these results are compatible with the hypothesis that ITCs from cruciferous

vegetables modify risk of colorectal cancer in individuals with low GST activity.

Page 27: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

27

Further, this gene–diet interaction may be important in studies evaluating the effect of

risk-enhancing compounds in the colorectum.

Chick Class-theta Glutathione S- transferase CL1 model[62], was constructed

by using multiple sequence alignment algorithm of Feng on dolittle. GST was taken

from purified from 1-day-old chick livers were digested with Achromobacter

proteinase1.The resulting fragments were utilized for sequence analysis. This protein

has 70-73% sequence similarity with other mammalian class-Theta glutathione S-

transferae. Based on the known X-ray crystal structures of class-Alpha,-Mu and –Pi

glutathione S-transferase , a model is constructed for the N-terminal 232 residues of

CL1.The final model of the CLI subunit was obtained by applying the CHARMM

energy-minimization process with the CHARMM/QUANTA program package. The

program ran through 100 cycles of steepest descent minimization.

Homology modeling was used for model building of a terminal thiamine

biosynthesis enzyme phosphoryl thymidine kinase (Thi E) using Geno3D, Swiss

Model and Modeler. The best model was selected based on overall stereo chemical

quality. The potential ligand binding sites in the model were identified by CASTp

server. The validated theoretical model of the 3D structure of the thi E protein of E.

coli O157:H7 was predicted using a thiamine phosphate pyrophosphatase

from Pyrococcus furiosus (PDB ID: 1X13_A) as template.

Dengue is a serious public health problem in tropical and subtropical

countries. It is caused by any of the four serologically distinct dengue viruses, namely

DENV1-4.The three-dimensional structures of the three isoforms of Ae. Aegypti

defensins predicted by comparative modeling[63][64]. Prediction was done with

Modeller 9v1 and the structures validated through a series of tests.

Page 28: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

28

1.6 HOMOLOGY MODELLING ON ANNEXIN

During the last two decades, the number of sequence-known proteins has

increased rapidly. In contrast, the corresponding increment of structure-known

protein is much slower. The unbalanced situation has critically limited our ability to

understand the molecular mechanism of protein and conduct structure based drug

design timely by using updated information of newly found sequences. Therefore it is

highly desired to model 3D structure of protein by using structural bioinformatics

approach by homology modeling. In this study homology modeling approach was

utilized to develop 3D structure of Aspergillus fumigatusAf293 (anxc3.1)[65]. An

annexin, (anxC3.1), was isolated and characterized from the industrially important

filamentous fungus Aspergillus niger. AnxC3.1 is a single copy gene encoding a 506

amino acid predicted protein which contains four annexin

repeats. AnxC3.1 expression was found to be unaltered under a variety of conditions

such as increased secretion, altered nitrogen source, heat shock, and decreased

Ca2+

levels, indicating that anxC3.1 is constitutively expressed. This is the first

reported functional characterization of a fungal annexin. So it highly desired to

develop a model to this protein.

Anticancer drug development using the platform of glutathione (GSH),

glutathione S-transferase (GST) and pathways that maintain thiol homeostasis has

recently produced a number of lead compounds. GSTπ is a prevalent protein in many

solid tumors and is over expressed in cancers resistant to drugs. It has proved to be a

viable target for pro-drug activation with at least one candidate in late-stage clinical

development.

In addition, GSTπ possesses noncatalytic ligand-binding properties important

in the direct regulation of kinase pathways. This has led to the development and

Page 29: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

29

testing of agents that bind to GSTπ and interfere with protein–protein interactions,

with the phase II clinical testing of one such drug. Attachment of glutathione to

acceptor cysteine residues (glutathionylation) is a post translational modification that

can alter the structure and function of proteins. Two agents in preclinical development

(PABA/NO, releasing nitric oxide on GST activation, and NOV-002, a

pharmacologically stabilized pharmaceutical form of GSSG) can lead to

glutathionylation of a number of cellular proteins. The biological significance of these

modifications is linked with the mechanism of action of these drugs. In the short term,

glutathione-based systems should continue to provide viable targets and a platform for

the development of novel cancer drugs

Modeller is a computer program[66] that models three-dimensional structures

of proteins and their assemblies by satisfaction of spatial restraints. Modeller is most

frequently used for homology or comparative protein structure modeling: The user

provides an alignment of a sequence to be modeled with known related structures and

Modeller will automatically calculate a model with all non-hydrogen atoms.

More generally, the input to the program is a restraint on the spatial structure

of the amino acid sequence(s) and ligands to be modeled. The output is a 3D structure

that satisfies these restraints as far as possible. Restraints can in principle be derived

from a number of different sources. These include related protein structures

(comparative modeling), NMR experiments rules of secondary structure packing

(combinatorial modeling),cross-linking experiments, fluorescence spectroscopy,

image reconstruction in electron microscopy, site-directed mutagenesis, intuition,

residue–residue and atom–atom potentials of mean force, etc.

The restraints can operate on distances, angles, dihedral angles, pairs of

dihedral angles and some other spatial features defined by atoms or pseudo atoms.

Page 30: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

30

Presently, Modeller automatically derives the restraints only from the known related

structures and their alignment with the target sequence. A 3D model is obtained by

optimization of a molecular probability density function.

The molecular function for comparative modeling is optimized with the

variable target function procedure in Cartesian space that employs methods of

conjugate gradients and molecular dynamics with simulated annealing. Modeller can

also perform multiple comparison of protein sequences and/or structures, clustering of

proteins, and searching of sequence databases. The program is used with a scripting

language and does not include any graphics. It is written in standard Fortran 90 and

will run on Unix, Windows, or Mac computers. The core modeling procedure begins

with an alignment of the sequence to be modeled (target) with related known 3D

structures (templates). This alignment is usually the input to the program. The output

is a 3D model for the target sequence containing all main chain and side chain non-

hydrogen atoms. Given an alignment, the model is obtained without any user

intervention. First, many distance and dihedral angle restraints on the target sequence

are calculated from its alignment with template 3D structures.

The form of these restraints was obtained from a statistical analysis of the

relationships between many pairs of homologous structures. This analysis relied on a

database of 105 family alignments that included 416 proteins with known 3D

structure. By scanning the database, tables quantifying various correlations were

obtained, such as the correlations between two equivalent C-Alpha distances, or

between equivalent main chain dihedral angles from two related proteins.

These relationships were expressed as conditional probability density

functions and can be used directly as spatial restraints. For example, probabilities for

different values of the main chain dihedral angles are calculated from the type of a

Page 31: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

31

residue considered, from main chain conformation of an equivalent residue, and from

sequence similarity between the two proteins.

Another example is, the function for a certain C-Alpha distance has given

equivalent distances in two related protein structures. An important feature of the

method is that the spatial restraints are obtained empirically, from a database of

protein structure alignments. Next, the spatial restraints and CHARMM energy terms

enforcing proper stereochemistry are combined into an objective function.

Finally, the model is obtained by optimizing the objective function in

Cartesian space. The optimization is carried out by the use of the variable target

function method employing methods of conjugate gradients and molecular dynamics

with simulated annealing. Several slightly different models can be calculated by

varying the initial structure. The variability among these models can be used to

estimate the errors.

MolDock is based on a new heuristic search algorithm that combines

differential evolution with a cavity prediction algorithm[67]. The docking scoring

function of MolDock is an extension of the piecewise linear potential (PLP) including

new hydrogen bonding and electrostatic terms. To further improve docking accuracy,

a re-ranking scoring function is introduced, which identifies the most promising

docking solution from the solutions obtained by the docking algorithm. The docking

accuracy of MolDock has been evaluated by docking flexible ligands to 77 proteins.

One application of molecular docking is to design pharmaceuticals in silico by

optimizing lead candidates targeted against proteins. The lead candidates can be found

using a docking algorithm that tries to identify the optimal binding mode of a small

molecule (ligand) to the active site of a macromolecular target. Thus, the purpose of

drug discovery is to derive drugs that more strongly bind to a given protein target than

Page 32: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

32

the natural substrate. By doing so, the biochemical reaction that the target molecule

catalyzes can be altered or prevented. The scoring function used by MolDock is

derived from the PLP scoring functions originally proposed by Gehlhaar et al. and

later extended by Yang et al. The scoring function used by MolDock further improves

these scoring functions with a new hydrogen bonding term and new charge schemes.

1.7 COMPUTER- AIDED DRUG DESIGN (CADD)

During the last decades the field of drug discovery process that direct to new

ligands finding turns into the modern science employing of computational and

experimental approaches. The main experimental methods are combinatorial

chemistry and high-throughput screening. Computer-assisted drug design (CADD)

uses computational chemistry to discover, enhance, or study drugs and related

biologically active molecules. The most fundamental goal is to predict whether a

given molecule will bind to a target and if so how strongly. The CADD techniques

have applied to discover novel therapeutics against inflammatory bowel disease

include protein homology modeling, and docking.

Computer-Aided Drug Design [68] (CADD), also known as computer-assisted

molecular design (CAMD), is a specialized discipline that uses computational

methods to simulate drug-receptor interactions. CADD methods are heavily

dependent on bioinformatics tools, applications and databases. In most current

applications of CADD, attempts are made to find a ligand (the putative drug) that will

interact favorably with a receptor that represents the target site. Binding of ligand to

the receptor may include hydrophobic, electrostatic, and hydrogen-bonding

interactions.

In addition, solvation energies of the ligand and receptor site also are

important because partial to complete desolation must or may occur prior to binding.

Page 33: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

33

This approach to CADD optimizes the fit of a ligand in a receptor site. However,

optimum fit in a target site does not guarantee that the desired activity of the drug will

be enhanced or that undesired side effects will be diminished. Moreover, this

approach does not consider the pharmacokinetics of the drug. CADD is dependent

upon the amount of information that is available about the ligand and receptor.

Ideally, one would have 3-dimensional structural information for the receptor and the

ligand-receptor complex from X-ray diffraction or NMR but in the opposite extreme,

one may have no experimental data to assist in building models of the ligand and

receptor, in which case computational methods must be applied without the

constraints that the experimental data would provide.

Based on the information that is available two main approaches for CADD.

One is ligand-based and second is receptor-based molecular design methods. Ligand

base approach is used when the structure of the receptor site is unknown while

receptor-based approach used when a reliable model of the receptor site is available,

as from X-ray diffraction, NMR, or homology modeling. With the availability of the

receptor site, the problem is to design ligands that will interact favorably at the site,

which is a docking problem.

Earlier drug designing drugs has always been used as trial-and-error process.

It was not until the 1960's that some understanding began to develop about the

quantitative relationship between the structure of a drug and its biological activity.

The understanding about this quantitative relationship ushered in a new era in drug

discovery system in the form of computer aided drug design.

The conventional way of synthesizing drugs has always been a tedious

process, consuming several years, huge man power and extremely expensive in order

to come up with a single effective novel drug. However, with the advent of

Page 34: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

34

information technology, it can be done efficiently. In silico, thereby saving huge

funds and man power. The pipeline of drug discovery from idea to market consists of

seven basic steps: disease selection, target selection, lead compound identification,

lead optimization, preclinical trials, and clinical trial testing and pharmacogenomic

optimization.

1.7.1 Homology Modeling using CADD

Homology modeling refers to constructing an atomic-resolution model of the

target protein from its amino acid sequence according to experimental three-

dimensional structure of a related homologous protein. Homology modeling relies on

the identification of one or more known protein structures likely to resemble the

structure of the query sequence, and on the production of an alignment that maps

residues in the query sequence to residues in the template sequence . It has been

shown that protein structures are more conserved than protein sequences amongst

homologues. The homology modeling procedure can be broken down into four

sequential steps: template selection, target-template alignment, model construction,

and model assessment. The homology modeling programs we mainly use are SWISS-

MODEL and MODELLER. We have successfully performed homology modeling to

construct a three-dimensional structure of GLUTATHIONE S-TRANSFERASE

(GST) PROTEIN.

1.7.2 Docking using CADD

Computational simulation of a candidate ligand binding to a receptor.In the

field of molecular modeling, docking is a method which predicts the preferred

orientation of one molecule to a second when bound to each other to form a stable

complex. Knowledge of the preferred orientation in turn may be used to predict the

Page 35: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

35

strength of association or binding affinity between two molecules using for example

scoring functions.

The associations between biologically relevant molecules such as proteins,

nucleic acids, carbohydrates, and lipids play a central role in signal transduction.

Furthermore, the relative orientation of the two interacting partners may affect the

type of signal produced (e.g., agonism vs antagonism). Therefore docking is useful

for predicting both the strength and type of signal produced.

Docking is frequently used to predict the binding orientation of small

molecule drug candidates to their protein targets in order to in turn predict the affinity

and activity of the small molecule. Hence docking plays an important role in the

rational design of drugs. Given the biological and pharmaceutical significance of

molecular docking, considerable efforts have been directed towards improving the

methods used to predict docking .

From the above studies it has been found that various sequence analysis and

modeling studies were carried out on variants of GST’s. However there was no study

reported on chick(Gallus Gallus) GST protein[69]. Hence in this work various

sequence analysis tools, homology modeling procedures were employed to build 3D

structure of the Chick GST protein. This study will be helpful for the development

GST family and to design drugs for the various cancers which are related to the GST

enzyme. Designing new drugs is a significant part of science. Drug design is the

advancement of finding drugs by design, based on their biological targets.

Typically a drug target is a key molecule involved in a particular metabolic or

signaling pathway that is specific to a disease condition or to the infectivity or

survival of a microbial pathogen. Some approaches attempt to stop the functioning of

the pathway in the diseased state by causing a key molecule to stop functioning.

Page 36: Chapter-1 INTRODUCTIONshodhganga.inflibnet.ac.in/bitstream/10603/8277/11/11_chapter 1.pdf · protein sequence to structure and function relationship is well established and reveals

36

Drugs may be designed that bind to the active region and inhibit this key molecule.

However these drugs would also have to be designed in such a way as not to affect

any other important molecules that may be similar in appearance to the key

molecules. Sequence homologies are often used to identify such risks.

1.8 RESEARCH QUESTIONS

How to predict the structure of Glutathione S transferases based on the

sequence similarities?

What is the computational approach to highlight the crucial amino acid

residues responsible for functional attributes?

How to understand the interactions at binding sites of ligands (homosapiens) –

protein (GST)?

How the computational techniques can be used to reduce the time spent in

synthesizing compounds and how further experimental procedures on

designed compounds would lead to effective compounds by using computer

aided drug designing (CADD)?